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This  is  the  final  report  of  the  research  project  entitled  “3-D  Radar  Compression  Algorithm  Development 
for  Reconfigurable  Processing.”  During  this  period,  the  project  has  supported  the  Principal  Investigator 
to  supervise  two  graduate  students  to  study  two  technical  problems  related  to  wavelet-based  image 
compression.  In  the  remainder  of  this  document,  we  report  the  results  of  our  research. 


A.  Technical  Results 

Our  research  has  been  on  two  topics.  One  is  in  the  wavelet-based  image  (2-D)  and  video  (3-D)  denoising, 
and  the  other  is  in  feature  selection  for  object  tracking  using  the  Support  Vector  Machine  (SVM) 
approach.  The  activities  on  each  of  the  topics  are  described  separately  below. 


A.1  Wavelet  Transform  for  Denoising 

Image  denoising  is  a  difficult  technical  problem  and  has  been  studied  for  a  long  time.  It  becomes  very 
necessary  for  image  compression  because  noise  consumes  bits  but  does  not  contribute  to  the  quality. 
Only  when  the  noise  is  removed  can  image  compression  become  more  efficient  and  produce  better  quality 
of  the  image.  Conventional  approaches  use  low-pass  filters  to  remove  the  high-frequency  component  of 
the  noise  which  produce  non-optimal  results  because  many  image  features  have  high-frequency 
components  too.  We  have  developed  a  feature-based  wavelet  shrinkage  algorithm  for  both  image  and 
video  denoising  which  has  produced  very  good  results.  The  new  method  has  two  advantages: 

1 .  The  algorithm  is  based  on  wavelet  transform  which  is  also  the  fundamental  tool  for  image  compression 
including  radar  images;  and,  as  a  result,  not  so  much  new  software  has  to  be  developed. 
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2.  Denoising  is  based  on  the  features,  not  on  the  frequency;  consequently,  high-frequency  components  of 
an  image  are  not  removed  if  they  are  judged  to  be  features  and  that  contributes  to  the  quality  of  the 
image. 

Furthermore,  our  wavelet  shrinkage  algorithm  is  also  extended  to  video,  i.e.,  denoising  in  both  spatial  and 
temporal  domains. 


A.1.1  Wavelet  Shrinkage  Algorithm  for  Image  Denoising 

The  wavelet  shrinkage  algorithm  for  image  denoising  includes  a  few  steps.  First,  wavelet  transform  is 
applied  to  a  corrupted  image  to  decompose  the  image  into  multiresolution  subbands.  Next,  wavelet 
coefficients  in  each  subband  are  examined  according  to  two  criteria: 

1.  If  a  coefficient  is  larger  than  a  threshold  value  to  represent  the  features  of  the  image,  it  is  kept; 
otherwise,  it  is  considered  as  noise  and  is  removed. 

2.  For  the  remaining  coefficients,  the  algorithm  further  checks  to  see  if  each  coefficient  is  supported  by 
its  neighbors.  If  it  is,  the  coefficient  is  kept;  otherwise,  it  is  removed. 

Finally,  the  remaining  coefficients  are  used  to  reconstruct  the  image  which  is  the  denoised  image.  The 
new  method  uses  threshold  and  support  of  a  wavelet  coefficient  to  determine  its  usefulness  in  the  image 
reconstruction.  The  two  parameters  are  closely  related  to  features;  therefore,  the  algorithm  is  a  feature- 
based  denoising  algorithm. 

The  success  of  the  algorithm  depends  on  the  optimal  selection  of  r  (threshold)  and  s  (support), 
respectively.  These  two  values  are  determined  using  the  following  two  equations: 
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r  =  aTa„+b 


(1) 


5  =  asan+bs .  (2) 

where  as,  a  T,  bs,  and  bT  are  parameters  to  be  determined,  and  cr„  is  the  estimated  noise  level  of  the  image. 
We  use  a  set  of  test  images,  which  serve  as  training  samples,  to  determine  the  four  parameters.  We  use  a 
number  of  training  images  which  are  corrupted  by  a  known  level  of  noise  then  we  optimize  the  quality  of 
the  denoised  image  by  varying  as,  aT,  bs,  and  bT ,  respectively.  Those  values  which  produce  the  best 
quality  of  the  denoised  images  are  used  by  as,  av  bs,  and  bT  in  equations  (1)  and  (2).  Once  the  four 
parameters  are  determined,  the  above  two  equations  are  universally  used  to  denoise  other  images 
regardless  of  the  level  of  noise  and  the  content  of  the  image. 

The  principle  governing  this  parameter  selection  approach  is  called  machine  learning.  The  “machine” 
determines  the  value  of  parameters  from  n  training  samples  represented  by  vector  Vi,  i  =  1,  ...n  then 
assume  that  the  space  spanned  by  the  training  samples  covers  all  types  of  images. 


A.1 .2  The  Performance  of  the  Wavelet  Shrinkage  Algorithm  for  Image 
Denoising 

The  new  denoising  algorithm  is  compared  with  a  number  of  existing  denoising  algorithms.  The  result  is 
in  favor  of  the  new  one.  The  Peak-Signal-to-Noise-Ratio  (PSNR)  of  a  noise-corrupted  image  Pepper  is 
processed  by  various  denoising  methods.  As  shown  in  the  next  table,  our  method  has  a  higher  average 
improvement  in  PSNR  than  any  of  the  existing  methods. 
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Peppers”  Image 


Image  Input  PSNR 

22.6 

19.6 

16.6 

13.6 

Average 

Our  Algorithm 

31 

28.98 

27.17 

25.46 

28.15 

Pizurica  3 -band 

algorithm 

30.20 

28.60 

27.00 

25.20 

27.75 

Pizurica  2-band 

algorithm 

29.90 

28.20 

26.60 

24.90 

27.40 

Malfait  and  Roose 

algorithm 

28.60 

27.30 

26.00 

24.60 

26.63 

Mallat  and  Hwang 

algorithm 

28.20 

27.30 

27.10 

24.60 

26.80 

Matlabs’s  Sp. 

Adaptive  Wiener 

29.00 

27.10 

25.30 

23.30 

26.18 

The  references  of  the  other  methods  listed  in  the  above  table  can  be  found  in  the  attached  paper  entitled 
“Feature-Based  Wavelet  Shrinkage  Algorithm  for  Image  Denoising.” 


A.1 .3  Wavelet  Shrinkage  Algorithm  for  Video  Denoising 

The  feature -based  wavelet  shrinkage  algorithm  is  also  applied  to  video  including  both  spatial  (for 
individual  frames)  and  temporal  (for  multiple  consecutive  frames)  domains.  By  using  the  same  principal, 
the  features  in  the  temporal  domain  are  also  considered  in  video  noise  reduction.  The  features  in  the 
temporal  domain  are  the  motion  of  objects.  The  algorithm  takes  the  similar  steps: 
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a.  apply  the  2-D  wavelet  shrinkage  algorithm  to  each  individual  image, 

b.  apply  the  wavelet  transform  to  the  2-D  denoised  images  in  the  temporal  domain, 

c.  use  threshold  and  support  to  determine  if  the  wavelet  coefficients  in  the  temporal  domain  should  be 
kept  or  removed,  and 

d.  reconstruct  the  video  using  the  remaining  wavelet  coefficients  which  produced  the  denoised  video. 

After  the  wavelet  transform  in  the  temporal  domain,  the  magnitude  of  the  wavelet  coefficient  is  compared 
with  a  threshold  rz  as  mentioned  in  step  c  above.  The  threshold  is  determined  by  the  following  equation: 

vz  =  a<rn  +  J3M,  (3) 

where  cr„  is  the  estimated  level  of  noise,  and  Mj  is  the  motion  index  of  the  video  sequence.  The  higher 
the  level  of  noise  and  the  motion  index,  the  greater  the  threshold.  From  (3),  one  can  see  that  even  with  <yn 
and  Mi  given,  one  still  has  to  find  a  and  f5  to  obtain  the  threshold.  To  find  the  best  a  and  fl  we  use  the 
same  machine  learning  approach  as  for  image  denoising;  that  is,  a  number  of  training  video  samples  are 
first  corrupted  by  a  known  level  of  noise.  We  optimize  the  quality  of  denoised  video  by  using  equation 
(3)  and  by  varying  a  and  fi  respectively.  The  a  and  ft  which  produce  the  best  denoising  effect  are  chosen 
for  use  in  the  equation  (3)  for  denoising  corrupted  videos  universally.  We  assume  that  the  space  spanned 
by  the  training  samples  covers  a  wide  scope  of  videos.  The  question  is  what  kind  of  videos  one  should 
use  to  train  the  machine  such  that  the  machine  trained  is  general  enough.  This  question  still  needs  to  be 
studied  in  both  theory  and  experiments. 
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We  have  produced  three  papers  (by  Eric  J.  Balster,  Yuan  F.  Zheng,  and  Robert  L.  Ewing)  for  this 
denoising  work.  The  details  of  the  algorithms  described  above  can  be  found  in  the  three  papers  which  are 
attached  as  an  appendix  of  this  report. 

A. 2  Feature  Selection  for  Efficient  Object  Classification  and  Tracking 

A.2.1  Motivation 

Another  topic  we  studied  for  this  project  is  video  object  segmentation  and  tracking.  The  idea  is  to  track 
video  objects  in  the  video  sequence  such  that  objects  can  be  grouped  together  for  efficient  3-D  object- 
based  compression.  A  paper  on  the  video  object  segmentation  and  tracking  entitled  “A  GM-Based  Multi- 
Layer  Method  for  Object  Tracking  In  Video  Sequence”  was  presented  at  the  IEEE  International 
Conference  on  Information  Technology:  Research  and  Education,  August  10-13,  2003  in  Newark,  New 
Jersey.  Based  on  that  work,  our  new  study  has  been  on  optimal  feature  representation  of  video  objects. 
The  purpose  is  to  reduce  the  dimension  of  features  which  have  to  be  used  to  represent  and  classify  the 
object.  As  a  result,  the  computation  for  training  the  classifier,  which  is  based  on  SVM,  can  be 
significantly  reduced. 

For  representing  a  video  object,  we  divide  a  video  frame  into  blocks  and  name  each  block  the  object 
block,  if  the  block  is  a  part  of  the  object,  and  background  block,  if  it  is  a  part  of  the  background.  The 
features  of  each  block  are  extracted  by  applying  the  Discrete  Cosine  Transform  (DCT)  to  each  block.  The 
run  time  analysis  shows  that  the  feature  extraction  operation  takes  nearly  99.7%  of  the  whole  run  time  for 
training  the  SVM,  for  which  DCT  is  the  major  contributor.  To  further  reduce  the  processing  time,  we 
need  a  feature  selection  method  to  select  only  a  subset  of  the  DCT  coefficients  that  are  most  essential  for 
cbject/background  separation. 
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In  general,  feature  selection  approaches  can  be  grouped  into  two  categories:  filter  methods  and  wrapper 
methods  [1],  The  major  difference  between  the  two  is  that  the  wrapper  methods,  which  evaluate  the 
“goodness”  of  the  selected  feature  subset  directly  by  the  classification  accuracy,  are  classifier-dependent 
while  the  filter  methods  are  not.  On  the  contrary,  the  filter  methods  estimate  the  classification 
performance  by  some  indirect  assessments  such  as  distance  measurements,  which  reflect  how  well  the 
classes  separate  from  each  other.  Intuitively,  the  wrapper  methods  may  yield  better  performance  and 
actually  many  experimental  results  reported  so  far  are  in  favor  of  the  wrapper  methods  [2,  3].  However, 
given  the  fact  that  training  just  a  single  SVM  would  impose  a  lot  of  computation  when  the  number  of 
training  samples  is  large,  the  integration  of  SVM  with  the  wrapper  methods,  which  calls  for  multiple 
times  of  training,  is  computationally  infeasible.  We,  by  taking  the  advantage  of  the  existence  of  support 
vectors  of  SVM,  designed  an  expedited  wrapper  method  for  SVM  which  is  named  Filtered  and  Supported 
Sequential  Forward  Search  (FSSFS). 


A.2.2  The  FS_SFS  Search  Algorithm 

The  outline  of  FS  SFS  is  shown  in  Fig.  1.  The  filtering  part  in  our  approach,  acting  in  a  generic  way 
similar  to  a  filter  method,  ranks  features  without  involving  the  classifier.  The  features  with  relatively 
high  ranks  are  considered  as  “informative”  feature  candidates  and  then  re-studied  by  the  wrapper  part  that 
further  investigates  their  contributions  to  a  specific  classifier.  In  other  words,  some  features  are  discarded 
before  the  wrapper  part  begins;  consequently,  the  total  number  of  training  is  reduced.  Furthermore,  an 
active  training  set  is  dynamically  maintained  as  the  estimated  candidates  of  the  support  vectors  and  each 
SVM  are  trained  by  using  this  reduced  subset  rather  than  the  whole  original  training  set.  In  this  way,  we 
are  able  to  reduce  the  training  complexity  of  a  single  SVM.  With  the  framework  determined,  the  feature 
selection  problem  is  reduced  to  a  search  problem  to  find  the  optimal  subset,  and  we  adopt  a  suboptimal 
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search  method  called  sequential  forward  search  (SFS)  algorithm  for  its  simplicity  and  effectiveness 
proven  in  many  applications. 


A.2.3  The  Performance  of  the  FS_SFS  Method 

We  test  FSSFS  on  four  real  data  sets  that  are  from  the  widely  used  UCI  (University  of  California,  Irvine) 
repository  of  machine  learning  [4],  and  the  results  are  very  encouraging.  The  preliminary  results  show 
that  FS  SFS  is  able  to  reduce  the  computation  time  of  SFS  by  30%  without  sacrificing  the  accuracy  of 
either  the  selection  or  the  classification.  Our  next  plan  is  to  apply  FS  SFS  to  our  classification-based 
object  extraction  approach.  As  mentioned  before,  the  DCT  is  the  major  contributor  to  the  run  time  and 
we  want  to  skip  the  transformation  of  some  DCT  coefficients  that  are  not  essential  for  object/background 
separation.  By  doing  so,  the  number  of  DCT  operations  can  be  reduced  and,  as  a  result,  better  efficiency 
can  be  achieved.  The  details  of  this  new  algorithm  can  be  found  in  the  attached  papers  by  Yi  Liu  and 
Yuan  F.  Zheng. 


filter  part  wrapper  part 


Fig.  1:  The  structure  of  the  FS  SFS  method 


B.  Products 


B.1  Papers 


Five  technical  papers  have  been  produced  by  this  research  and  are  listed  below: 

1.  Eric  J.  Balster,  Yuan  F.  Zheng,  and  Robert  L.  Ewing,  “Feature-Based  Wavelet  Shrinkage  Algorithm 
for  Image  Denoising,”  submitted  to  IEEE  Transactions  on  Signal  Processing  for  publication. 

2.  Eric  J.  Balster,  Yuan  F.  Zheng,  and  Robert  L.  Ewing,  “Combined  Spatial  and  Temporal  Domain 
Wavelet  Shrinkage  Algorithm  for  Video  Denoising,”  submitted  to  IEEE  Transactions  on  Circuits  and 
Systems  for  Video  Technology  for  publication. 

3.  Eric  J.  Balster,  Yuan  F.  Zheng,  and  Robert  L.  Ewing,  “Combined  Spatial  and  Temporal  Domain 
Wavelet  Shrinkage  Algorithm  for  Video  Denoising,”  accepted  for  presentation  at  IEEE  International 
Symposium  on  Communication  Systems,  Networks  and  Digital  Signal  Processing,  Newcastle,  UK, 
July  20-22,  2004. 

4.  Yi  Liu  and  Yuan  F.  Zheng,  “FSSFS:  A  Novel  Feature  Selection  Method  for  Support  Vector 
Machine,”  submitted  to  Computer  Vision  and  Image  Understanding  for  publication. 

5.  Yi  Liu  and  Yuan  F.  Zheng,  “FS  SFS:  A  Novel  Feature  Selection  Method  for  Support  Vector 
Machine,”  accepted  for  presentation  at  IEEE  International  Conference  on  Acoustics,  Speech,  and 
Signal  Processing,  Montreal,  Quebec,  Canada,  May  17-21,  2004. 


B.2  Software  Package 

The  wavelet  compression  software,  developed  by  the  OSU  wavelet  compression  group,  has  been 
modified  to  include  the  image  and  video  denoising  algorithms.  Real-time  demonstration  of  the  software 
for  video  compression  and  decompression  was  performed  successfully.  Compression  ratio  is  significantly 
increased  with  the  denoising  algorithm,  and  the  quality  of  originally  corrupted  images  is  improved  as 
well. 
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C.  Summary 


In  this  research  project,  the  OSU  wavelet  compression  group  in  the  Department  of  Electrical  and 
Computer  Engineering  studied  two  technical  topics:  feature-based  wavelet  shrinkage  algorithm  for  image 
and  video  denoising  and  feature  selection  for  efficient  training  of  SVM.  We  derived  mathematical 
formulations  for  both  algorithms  as  well  as  conducted  experiments  to  verify  the  algorithms.  Both 
algorithms  have  generated  better  results  than  the  existing  and  conventional  methods.  These  two 
algorithms  will  improve  the  efficiency  of  our  wavelet  compression  software  significantly.  The  video 
object  classifying  algorithm  may  also  be  used  in  target  recognition  and  tracking  that  may  be  significant  to 
many  Air  Force  applications.  We  have  also  produced  five  technical  papers  for  this  project.  In  addition, 
our  software  is  improved  by  integrating  the  denoising  algorithm. 
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Abstract 

A  selective  wavelet  shrinkage  algorithm  for  digital  image  denoising  is  presented.  The  performance  of  this  method 
is  an  improvement  upon  other  methods  proposed  in  the  literature  and  is  algorithmically  simple  for  large  computa¬ 
tional  savings.  The  improved  performance  and  computational  speed  of  the  proposed  wavelet  shrinkage  algorithm 
is  presented  and  experimentally  compared  with  established  methods.  The  denoising  method  incorporated  in  the 
proposed  algorithm  involves  a  two-threshold  validation  process  for  real-time  selection  of  wavelet  coefficients.  The 
two-threshold  criteria  selects  wavelet  coefficients  based  on  their  absolute  value,  spatial  regularity,  and  regularity 
across  multiresolution  scales.  The  proposed  algorithm  takes  image  features  into  consideration  in  the  selection  pro¬ 
cess.  Statistically,  most  images  have  regular  features  resulting  in  connected  subband  coefficients.  Therefore,  the 
resulting  subbands  of  wavelet  transformed  images  in  large  part  do  not  contain  isolated  coefficients. 

In  the  proposed  algorithm,  coefficients  are  selected  due  to  their  magnitude,  and  only  a  subset  of  those  selected 
coefficients  which  exhibit  a  spatially  regular  behavior  remain  for  image  reconstruction.  Therefore,  two  thresholds 
are  used  in  the  coefficient  selection  process.  The  first  threshold  is  used  to  distinguish  coefficients  of  large  magnitude, 
and  the  second  is  used  to  distinguish  coefficients  of  spatial  regularity.  The  performance  of  the  proposed  wavelet 
denoising  technique  is  an  improvement  upon  several  other  established  wavelet  denoising  techniques  as  well  as  being 
computationally  efficient  to  facilitate  realtime  image  processing  applications. 

Keywords — Image  denoising,  selective  wavelet  shrinkage,  two-threshold  criteria. 

I.  Introduction 

The  recent  advancement  in  multimedia  technology  has  promoted  an  enormous  amount  of  research  in  the  area  of 
image  and  video  processing.  Included  in  the  many  image  and  video  processing  applications  such  as  compression, 
enhancement,  and  target  recognition  are  preprocessing  functions  for  noise  removal.  Noise  removal  is  one  of  the 
most  common  and  important  processing  steps  in  many  image  and  video  systems. 

Because  of  the  importance  and  commonality  of  preprocessing  in  most  image  and  video  systems,  there  has  been 
an  enormous  amount  of  research  dedicated  to  the  subject  of  noise  removal,  and  many  different  mathematical  tools 
have  been  proposed.  Variable  coefficient  linear  filters  [5, 17,20,27],  adaptive  nonlinear  filters  [9, 16, 18,28],  DCT 
based  solutions  [11],  cluster  filtering  [26],  genetic  algorithms  [25],  fuzzy  logic  [12,22],  etc.  have  all  been  proposed 
in  the  literature. 

The  wavelet  transform  has  also  been  used  to  suppress  noise  in  digital  images.  It  has  been  shown  that  the  reduction 
of  absolute  value  in  wavelet  coefficients  is  successful  in  signal  restoration  [15],  This  process  is  known  as  wavelet 
shrinkage.  Other  more  complex  denoising  techniques  select  or  reject  wavelet  coefficients  based  on  their  predicted 
contribution  to  reconstructed  image  quality.  This  process  is  known  as  selective  wavelet  shrinkage,  and  many  works 
have  used  it  as  the  preferred  method  of  image  denoising.  Preliminary  methods  predict  the  contribution  of  the  wavelet 
coefficients  based  on  the  magnitude  of  the  wavelet  coefficients  [24],  and  others  based  on  intra-scale  dependencies 
of  the  wavelet  coefficients  [3,7, 13, 15].  More  recent  denoising  methods  are  based  on  both  intra-  and  inter-scale 
coefficient  dependencies  [6,8, 10, 14, 19], 

Mallat  and  Hwang  prove  the  successful  removal  of  noise  in  signals  via  the  wavelet  transform  by  selecting  and 
rejecting  wavelet  coefficients  based  on  their  Lipschitz  (Holder)  exponents  [15].  The  Holder  exponent  is  a  measure 
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of  regularity  in  a  signal,  and  it  may  be  approximated  by  the  evolution  of  wavelet  coefficient  ratios  across  scales. 
Thus,  this  regularity  metric  used  in  selecting  those  wavelet  coefficients  which  are  to  be  used  in  reconstruction,  and 
those  which  are  not.  Although  this  fundamental  work  in  image  denoising  is  successful  in  the  removal  of  noise,  its 
application  is  broad  and  not  focused  on  image  noise  removal,  and  the  results  are  not  optimal. 

Malfait  and  Roose  refined  the  selective  shrinkage  denoising  approach  by  applying  a  Bayesian  probabilistic  for¬ 
mulation,  and  modeling  the  wavelet  coefficients  as  Markov  random  sequences  [14],  This  method  is  focused  on 
image  denoising  and  its  results  are  an  improvement  upon  [15],  The  Lipschitz  (Holder)  exponents  are  roughly 
approximated  by  the  evolution  of  coefficient  values  across  scales,  i.e. 

—  -J-  Afc+1'” 

"H,n  —  p_l  2_,k=l  \k  n  ’ 


where  m^n  is  the  approximated  Holder  exponent  of  position  n  of  scale  l  and  A k,n  is  the  wavelet  coefficient  of  scale 
k  and  position  n.  The  rough  approximation  is  refined  by  assuming  that  the  coefficient  values  are  well  modeled 
as  a  Markov  chain,  and  the  probability  of  a  coefficients  contribution  to  the  image  can  be  well  approximated  by 
the  Holder  exponents  of  neighboring  coefficients.  Coefficients  are  then  assigned  binary  labels  a;*  n  of  position  n 
depending  on  their  predicted  retention  for  reconstruction  (xk.n  =  1),  or  predicted  removal  (xk,n  =  0).  The  binary 
labels  are  then  randomly  and  iteratively  switched  until  P(X\AI)  is  maximized,  where  Xk,n  G  X  and  mk,n  G  M. 
The  coefficients  are  modified  by  Xjfn  =  ^k,nP(xk,n  =  1|M),  and  the  denoised  image  is  formed  by  the  inverse 
wavelet  transform  of  the  modified  coefficients.  Each  coefficient  is  reduced  in  magnitude  depending  on  the  probable 
contribution  to  the  image,  i.e.  P(xk,n  =  1|M). 

Later,  Pizurica,  et  al.  ( [19])  continued  on  the  work  done  by  [14]  by  using  a  different  approximation  of  the  Holder 
exponent  given  by 


n  —  -X- 

rl,n  —  p_l  Xk-l 


Ik  +  1,n 

Ik,n 


where 


Ik,n  ~  YhteC(k,n)  I^mI- 

Pk  n  is  the  approximation  of  the  Holder  exponent,  and  C(k.  n)  is  the  set  of  coefficients  surrounding  A k,n-  This 
work  applies  the  same  probabilistic  model  as  [14]  using  the  new  approximation  of  the  Holder  exponent.  Coef¬ 
ficients  are  assigned  binary  labels,  Xk,n,  depending  on  their  predicted  retention  for  reconstruction  (xk,n  =  1), 
or  predicted  removal  ( Xk,n  =  0).  The  binary  labels  are  then  randomly  and  iteratively  switched  until  P(X\M) 
is  maximized.  Unlike  [14],  the  significance  measure  of  a  coefficient,  M,  is  not  merely  its  Holder  exponent,  but 
evaluated  by  the  magnitude  of  the  coefficients  as  well  as  its  Holder  approximation,  i.e.  fM\ximk,n\xk,n )  = 
fA\x(^k,n\xk,n)fn\x(Pk,n \xk,n)-  Thus  a  joint  measure  of  coefficient  significance  is  developed  based  on  both 
the  Holder  exponent  approximation  and  the  magnitude  of  the  wavelet  coefficient.  As  in  [14],  the  coefficients  are 
modified  by  A£e™  =  A k,nP(xk,n  =  1| M). 

Although  both  algorithms  in  [14]  and  [19]  show  promising  results  in  denoised  image  quality,  the  iterative  proce¬ 
dure  necessary  to  maximize  the  probability  P(X\M)  adds  computational  complexity  making  the  processing  times 
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of  the  algorithms  impractical  for  most  image  and  video  processing  applications.  Also,  the  Markov  Random  Field 
(MRF)  model  used  in  the  calculation  of  P(X\M)  is  not  appropriate  for  analysis  of  wavelet  coefficients  because  it 
ignores  the  influence  of  non-neighboring  coefficients.  The  MRF  model  is  strictly  used  for  simplicity  and  conceptual 
ease  [14]. 

From  the  review  of  the  literature,  one  can  see  that  image  denoising  remains  to  be  an  active  and  challenging  topic 
of  research.  The  major  challenge  lies  in  the  fact  that  one  does  not  know  what  the  original  signal  is  for  a  corrupted 
image.  The  performance  of  a  method,  on  the  other  hand,  can  only  be  measured  by  comparing  the  denoised  image 
with  its  origin.  In  this  paper,  we  propose  a  new  denosing  approach  which  consists  of  two  components.  The  first 
is  the  selective  wavelet  shrinkage  method  for  denoising,  and  the  second  is  a  new  threshold  selecting  method  which 
makes  use  of  test  images  as  training  samples. 

In  general,  selective  shrinkage  methods  are  comprised  of  three  processing  steps.  First,  a  corrupted  image  is 
decomposed  into  multiresolution  subbands  via  the  wavelet  transform.  Next,  wavelet  coefficients  are  modified 
based  upon  certain  criteria  to  predict  their  importance  in  reconstructed  image  quality.  Finally,  the  denoised  image 
is  formed  by  reconstructing  the  modified  coefficients  via  the  inverse  wavelet  transform.  The  processing  step  of 
most  cost  computationally  in  the  methods  of  [14]  and  [19]  and  greatest  importance  in  denoising  performance  is  the 
coefficient  modification  process,  which  calls  for  effective  and  efficient  criteria  to  modify  wavelet  coefficients.  To 
improve  performance,  this  paper  presents  a  new  coefficient  selection  process  which  uses  a  two-threshold  criteria  to 
non-iteratively  select  and  reject  wavelet  coefficients.  The  two-threshold  selection  criteria  results  in  an  effective  and 
computationally  simple  coefficient  selection  process. 

The  threshold  selection  method  is  presented  which  is  based  on  minimizing  the  error  between  the  wavelet  coef¬ 
ficients  of  the  denoised  image  and  the  wavelet  coefficients  of  an  optimally  denoised  image  produced  by  a  method 
using  supplemental  information.  The  supplemental  information  provided  produces  a  denoised  image  that  is  far  su¬ 
perior  than  any  method  which  does  not  utilize  supplemental  information.  Thus,  the  image  produced  by  the  method 
utilizing  supplemental  information  is  referred  to  as  an  optimally  denoised  image.  Using  several  test  cases,  the 
threshold  values  which  produce  the  minimum  difference  between  the  wavelet  coefficients  of  the  denoised  image 
and  the  wavelet  coefficients  of  the  optimally  denoised  image  are  chosen  as  the  threshold  values  for  the  general  case. 

The  two-threshold  coefficient  selection  method  results  in  a  denoising  algorithm  which  gives  improved  results 
upon  those  provided  by  [14, 19],  but  without  the  computational  complexity.  The  two-threshold  requirement  inves¬ 
tigates  the  regularities  of  wavelet  coefficients  both  spatially  and  across  scales  for  predictive  coefficient  selection, 
providing  selective  wavelet  shrinkage  to  non-decimated  wavelet  subbands. 

Following  the  Introduction,  Section  II  gives  theory  on  the  2D  non-decimated  wavelet  analysis  and  synthesis 
filters.  Section  III  then  describes  the  coefficient  selection  process  prior  to  selective  wavelet  shrinkage.  Section 
IV  gives  testing  results  for  parameter  selection.  Section  V  gives  the  estimation  algorithms  for  proper  parameter 
selection,  and  Section  VI  gives  the  results.  Section  VII  concludes  the  paper. 
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II.  2D  Non-Decimated  Wavelet  Analysis  and  Synthesis 

To  facilitate  the  discussion  of  the  proposed  method,  non-decimated  wavelet  filterbank  theory  is  presented.  In 
certain  applications  such  as  signal  denoising,  it  is  not  desirable  to  downsample  wavelet  coefficients  after  decompo¬ 
sition,  as  in  the  tradition  wavelet  filterbank.  The  spatial  resolution  of  the  coefficients  is  degraded  due  to  downsam¬ 
pling.  Therefore,  for  the  non-decimated  case,  each  subband  contains  the  same  number  of  coefficients  as  the  original 
signal. 

Let  ak[n]  and  dk[n\  be  scaling  and  wavelet  coefficients,  respectively,  of  scale  k  and  position  n.  Also  let  h[-] 
and  g[-]  be  the  filter  coefficients  corresponding  to  the  low-pass  and  high-pass  filter,  respectively,  of  the  wavelet 
transform. 

Thus, 

ak{2  k+1n\=ak[n] 

Xk[2k+1n]  =  dk[n\i 

where  ak[-]  are  the  non-decimated  scaling  function  coefficients,  and  A/i; [■]  are  the  non-decimated  wavelet  coeffi¬ 
cients.  Equation  1  is  substituted  into  the  scaling  analysis  filterbank  equation  to  find  the  non-decimated  filterbank 
equation: 

ak+ 1  [n]  =  h[m]ak[m  -  2 n\ 

ak+ J2fc+2n]  =  Ylm  h[m]ak[2k+1  (rn  -  2 n)]  (2) 

afe+i  N  =  Em  h[m]ak[2k+1m  -  n\. 

The  2k+1  scalar  introduced  into  Equation  2  is  equivalent  to  upsampling  h[-]  by  2k+1  prior  to  its  convolution  with 
ak[-\.  Similarly  Equation  1  is  substituted  into  the  wavelet  analysis  filterbank  equation  to  obtain 

Afc+Jn]  =  Y<m9[™\otk[2k+1m  -  n}.  (3) 

Figure  1  gives  a  block  diagram  of  the  non-decimated  wavelet  decomposition. 


Fig.  1.  Non-decimated  wavelet  decomposition 

The  synthesis  of  the  non-decimated  wavelet  transform  also  differs  from  the  downsampled  case.  From  the  wavelet 
synthesis  filterbank  equation  we  obtain, 

ak[2n]  =  ^2  h[2(n  -  m)]ak+1  [m\  +  ^  g[2(n  -  m)\dk+1  [to].  (4) 

m  m 
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Substituting  (p  =  n  —  m )  we  obtain, 

°fc[2n]  =  55  hi2p\ak+ 1  [« -  +  55  s'PpK+i  in  -  p]- 

p  p 

Substituting  Equation  1  into  Equation  5, 

ak[2k+2n]  =  Zph{2p\ak+1{2k+\n-p)\ 

+  EP  s[2p]  Afc+1  [2fc+2(n  -  y)] 

and 


(5) 


(6) 


(7) 


ak[n\  =  h[2p]ak+1  [n  -  2 k+2p\ 

+  Ep5[2P]E+1[n-  2k+2p). 

Looking  at  Equation  7  samples  are  being  thrown  away  by  downsampling  ak+1  [•]  and  Xk  [•]  by  2  prior  to  convolu¬ 
tion.  Because  the  downsampling  in  the  analysis  filters  is  eliminated,  a  downsample  by  2  is  shown  in  the  synthesis 
equation.  Equation  7.  If  a  downsample  by  2  is  not  performed,  i.e.  (to  =  2 p),  then  we  must  divide  by  2  to  provide 
power  equality.  That  is, 

Oik[n]  =  \  J2m  h[m\ak+1  [n  -  2fc+1TO] 


+  7  EmSHVi  [n  -  2fe+1m], 


(8) 


2  Z-^m  j  i  i  K+i 

Figure  2  gives  a  block  diagram  of  the  non-decimated  wavelet  transform  synthesis. 


<W«/ 

/jk  llnl 
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Fig.  2.  Non-decimated  wavelet  synthesis 


The  above  analysis  is  expanded  to  the  two-dimensional  case.  For  a  2-D  discrete  signal,  /(•), 

Uu,k+ 1  [x,  y\  =  En,m  h[n]h[rn]aii,k[2k+1rn  -  x,  2 k+1n  -  y] 

Aw,fe+1  [*,  y]  =  E „,m  h {n]g[m]otihk[2k+lm  -  x,  2k+1n  -  y] 

Ajm+1  [at,  y]  =  En,m  5W^Na«,fc[2fe+1m  -  a:,  2fe+1?r  -  y] 

\h,k+1  [x,  y\  =  En,m  -  ar,  2fc+1n  -  y], 

where 

[*>2/]  =  f(x,y).  (10) 

The  four  coefficient  sets  given  in  Equation  9  is  referred  to  as  the  low-low  band,  autk+1[-],  the  high-low  band, 
A the  low-high  band,  Xih,k+1  [■],  and  the  high-high  band,  A h.h,k+1[-\-  The  subbands  are  named  due  to  the 
order  in  which  the  scaling  and/or  the  wavelet  filters  process  the  scaling  function  coefficients. 
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For  the  synthesis  of  /(•), 

aii,k[x,y\  =  \  Y.m,nHpAh\n]au,k+i\x  ~  2k+1m,y-  2fe+1n] 

+  3  Em,n  [*  -  2fc+1m,  y  -  2k+1n] 

+  3  Em,™  pN/i[n]AZh)fc+1  [a;  -  2k+1m,  y  -  2 k+1n] 

+  3  T,m,n9[m]g[n]Xhh,k+i[x  -  2 k+1m,y  -  2 k+1n\ 

Equation  9  is  recursively  computed  to  produce  several  levels  of  wavelet  coefficients,  and  reconstruction  of  the  2-D 
signal,  /(•)  is  accomplished  by  the  recursive  computation  of  Equation  11. 

The  non-decimated  wavelet  transform  has  a  number  of  advantages  in  signal  denoising  over  the  traditional  deci¬ 
mated  case.  One,  each  subband  in  the  wavelet  decomposition  is  equal  in  size,  and  thus  it  is  more  straightforward  to 
find  the  spatial  relationships  between  subbands.  Two,  the  spatial  resolution  of  each  of  the  subbands  is  preserved  by 
eliminating  the  downsample  by  two.  Because  of  the  elimination  of  the  downsampler,  information  contained  in  the 
wavelet  coefficients  is  redundant,  and  this  redundancy  is  exploited  to  determine  the  coefficients  comprised  of  noise 
and  the  coefficients  comprised  of  feature  information  contained  in  the  original  image. 

III.  Retention  of  Feature- Supporting  Wavelet  Coefficients 

One  of  the  many  advantages  of  the  wavelet  transform  over  other  mathematical  transformations  is  the  retention 
of  the  spatial  relationship  between  pixels  in  the  original  image  by  the  coefficients  in  the  wavelet  domain.  These 
spatial  relationships  represent  features  of  the  image  and  should  be  retained  as  much  as  possible  during  denoising. 
In  general,  images  are  comprised  of  regular  features,  and  the  resulting  wavelet  transform  of  an  image  generates 
few,  large,  spatially  contiguous  coefficients  which  are  representative  of  the  features  given  in  the  original  image.  We 
refer  to  the  spatial  contiguity  of  the  wavelet  coefficients  as  spatial  regularity. 

The  concept  of  spatial  regularity  has  the  similar  function  as  that  of  signal  regularity  in  previous  denoising  ap¬ 
proaches  for  selecting  the  wavelet  coefficients.  The  key  difference  is  that  spatial  correlation  of  the  features  are 
represented  by  connectivity  of  wavelet  coefficients  rather  than  statistical  models  such  as  Markov  random  sequences 
[14, 19]  or  (Holder)  exponents  [14, 15, 19]  in  previous  methods.  These  models  are  often  computationally  compli¬ 
cated  and  still  do  not  reflect  the  geometry  of  the  features  explicitly.  As  a  result  the  current  method  has  a  better 
performance  even  with  a  much  simpler  computation. 

Because  of  spatial  regularity,  the  resulting  subbands  of  the  wavelet  transform  do  not  generally  contain  isolated 
coefficients.  This  regularity  can  aid  in  deciding  which  coefficients  should  be  selected  for  reconstruction,  and  which 
should  be  thrown  away  for  maximum  reconstructed  image  quality.  The  proposed  coefficient  selection  method  in 
which  spatial  regularity  is  exploited  is  shown  as  follows. 

Assume  that  an  image  signal  is  corrupted  with  additive  noise,  i.e. 

f(x,y)  =  f(x,y)  +  v(x,y),  (12) 

where  f(oc,  y)  is  the  noiseless  2D  signal,  77(21,  y)  is  a  random  noise  function,  and  f(x,  y)  is  the  corrupted  signal. 
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The  first  step  for  selecting  the  wavelet  coefficient  is  to  form  a  preliminary  binary  label  for  each  coefficient,  which 
collectively  form  a  binary  map.  The  binary  map  is  then  used  to  determine  whether  or  not  a  particular  wavelet 
coefficient  is  included  in  a  regular  spatial  feature.  The  wavelet  transform  of  /( x,  y)  generates  coefficients,  A. ,*;[•], 
from  Equations  9  and  10.  A.  *,[•]  is  used  to  create  the  preliminary  binary  map,  /.  *.[•]. 


i-Ax>y\ 


1,  when  \X.tk[x,y]\  >  t 

0,  else 


(13) 


where  r  is  a  threshold  for  selecting  valid  coefficients  in  the  construction  of  the  binary  coefficient  map.  A  valid 
coefficient  is  defined  as  a  coefficient,  \.^[x,y\,  which  results  in  I.^[x,y]  =  1;  hence  the  coefficient  has  been 
selected  due  to  its  magnitude.  After  coefficients  are  selected  by  magnitude,  spatial  regularity  is  used  to  further 
examine  the  role  of  the  valid  coefficient:  whether  it  is  isolated  noise  or  part  of  a  spatial  feature.  The  number  of 
supporting  binary  values  around  a  particular  non-zero  value  I..k[x,  y]  is  used  to  make  the  judgement.  The  support 
value,  S.tk [x,  y\,  is  the  sum  of  all  I.,k['}  which  support  the  current  binary  value  I-.k\x.  y\;  that  is,  the  total  number 
of  all  valid  coefficients  which  are  spatially  connected  to  I..k[x.  y\. 

A  coefficient  is  spatially  connected  to  another  if  there  exists  a  continuous  path  of  valid  coefficients  between  the 
two.  Figure  3  gives  a  generic  coefficient  map.  The  valid  coefficients  are  highlighted  in  gray.  From  Figure  3  it  can  be 
shown  that  coefficients  A,  B,  C,  and  H  do  not  support  any  other  valid  coefficients  in  the  coefficient  map.  However, 
coefficients  D  and  F  support  each  other,  coefficients  E  and  G  support  each  other,  and  N  and  O  support  each  other. 
Also,  coefficients  I,  J,  K,  F,  M,  P,  Q,  and  R  all  support  one  another.  Figure  4  gives  the  value  of  S.tk  [x,  y]  for  each  of 


Fig.  3.  Generic  coefficient  array 


the  valid  coefficients  given  in  Figure  3.  A  method  of  computing  y]  is  given  in  the  Appendix.  <S'.)fc[-]  is  used 

to  refine  the  original  binary  map  /.  *.[•]  by 


J-,k[x>y\ 


1,  when  S'.ifc[x,y]  >  s, 

or  J.tk+i[x,y]I.tk[z,y]  =  1  i 

0,  else 


(14) 
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Fig.  4.  Generic  coefficient  array,  with  corresponding  S.  &  values 

where  </.,&[•]  is  the  refined  binary  map,  and  s  is  the  necessary  number  of  support  coefficients  for  selection.  [■]  is 
calculated  recursively,  starting  from  the  highest  multiresolution  level,  and  progressing  downward. 

Equation  14  is  equal  to  one  when  there  exists  enough  wavelet  coefficients  of  large  magnitude  around  the  current 
coefficient.  However,  it  also  is  equal  to  one  when  the  magnitude  of  the  coefficient  is  effectively  large  (/.,&[•]  =  1) 
but  not  locally  supported  (</.,&[•]  =  0)  only  if  the  coefficient  of  the  larger  scale  is  large  and  locally  supported 
(■I-.k+l  [■]  =  1)-  The  decision  to  use  this  criterion  is  in  the  somewhat  rare  case  when  a  useful  coefficient  is  not 
locally  supported.  In  the  general  case,  wavelet  coefficients  of  images  are  clustered  together,  but  rarely  they  are 
isolated.  In  [15],  wavelet  coefficients  are  modified  only  by  their  evolution  across  scales.  Regular  signal  features 
contain  wavelet  coefficients  which  increase  with  increasing  scale.  Thus,  if  there  exists  a  useful  coefficient  which 
is  isolated  in  an  image,  it  is  reasonable  that  a  coefficient  in  the  same  spatial  location  of  an  increase  in  scale  will  be 
sufficiently  large  and  spatially  supported.  Thus,  the  coefficient  selection  method  provided  by  Equation  15  selects 
coefficients  which  are  sufficiently  large  and  locally  supported  as  well  as  isolated  coefficients  which  are  sufficiently 
large  and  supported  by  scale. 

This  type  of  scale-selection  is  consistent  with  the  findings  of  Said  and  Pearlman  [21],  who  developed  an  image 
codec  based  on  a  ’’spatial  self-symmetry”  between  differing  scales  in  wavelet  transformed  images.  They  discovered 
that  most  of  an  images  energy  is  concentrated  in  the  low-frequency  subbands  of  the  wavelet  transform.  And  because 
of  the  self-symmetry  properties  of  wavelet  transformed  images,  if  a  coefficient  value  is  insignificant  (i.e.  of  small 
value  or  zero),  then  it  can  be  assumed  that  the  coefficients  of  higher  spatial  frequency  and  same  spatial  location  will 
be  insignificant.  In  our  application,  however,  we  are  looking  for  significance  rather  than  insignificance,  so  we  look 
to  the  significance  of  lower  frequency  coefficients  to  determine  significance  of  the  current  coefficient.  In  this  way, 
the  preliminary  binary  map  is  refined  by  both  spatial  and  scalar  support,  given  by  equation  14. 
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The  final  coefficients  retained  for  reconstruction  are  given  by 

,  r  ,  J  A.ifc[a:,y],  when  J.tk[x,y]  =  1 

LMx,y\  =  \  ■  05) 

I  0,  else 

The  denoised  image  is  reconstructed  using  the  supported  coefficients,  Lk  [x,  y\  in  the  synthesis  equation  given  in 
Equation  11.  Thus, 

6tu,k\x,  y\  =  I  E m,n  h[m)h[n]aii,k+1  [x  -  2k+1m,  y  -  2k+ln] 

+  3  E m,n  h[m\g[n]Lhltk+1  [x  -  2 k+1m,  y  -  2 k+1n] 

(16) 

+  3  Em,n  9[™>\h[n\Llhikjp  1  [x  -  2 k+1m,  y  -  2 k+1n\ 

+  3  -  2 k+1m,y-  2 k+1n\ 

Equation  16  is  calculated  recursively  producing  scaling  coefficients  of  finer  resolution  until  k  =  — 1.  The  denoised 
image,  /(•),  is  then  given  by 

f(x,y)  =  au-i[x,y].  (17) 

&n  k  [•]  are  the  reconstructed  scaling  function  coefficients. 

In  general,  natural  and  synthetic  imagery  can  be  compactly  represented  in  few  wavelet  coefficients  of  large 
magnitude.  These  coefficients  are  in  general  spatially  clustered.  Thus,  it  is  useful  to  obtain  selection  methods 
based  on  magnitude  and  spatial  regularity  to  distinguish  between  useful  coefficients  which  are  representative  of  the 
image  and  useless  coefficients  representative  of  noise.  The  two-threshold  criteria  for  the  rejection  of  noisy  wavelet 
coefficients  is  a  computationally  simple,  non-iterative  test  for  magnitude  and  spatial  regularity  which  can  effectively 
distinguish  between  useful  and  useless  coefficients. 

IV.  Selection  of  Threshold  t  and  Support  s 

The  selection  of  threshold  r  and  support  s  is  a  key  component  of  the  denoising  algorithm.  Unfortunately,  the 
two  parameters  cannot  be  easily  determined  for  a  given  corrupted  image  because  there  is  no  information  about  the 
decomposition  between  the  original  signal  and  the  noise.  We  drive  t  and  s  using  a  set  of  test  images  which  serve 
as  training  samples.  These  training  samples  are  artificially  corrupted  by  noise.  The  noise  is  then  removed  by  a 
series  of  r  and  s.  The  set  of  r  and  s  which  generates  the  best  results  is  selected  for  noise  removing  in  general.  This 
approach  has  its  root  in  an  idea  called  oracle  (  [3])  which  is  described  below. 

An  oracle  is  an  entity  which  provides  extra  information  to  aid  in  the  denoising  process.  The  extra  information 
provided  by  the  oracle  is  undoubtedly  beneficial  in  providing  substantially  greater  denoising  results  than  methods 
which  are  not  furnished  supplemental  information.  Thus,  the  coefficient  selection  method  which  uses  the  oracle’s 
information  is  referred  to  as  the  optimal  denoising  method.  By  the  optimal  denoising  method  the  threshold  and 
support  can  be  selected  using  test  images  of  which  both  original  image  and  noise  are  known.  The  selected  threshold 
and  support  functions  can  then  be  selected  for  any  corrupted  images  without  supplemental  information. 
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An  optimal  coefficient  selection  process  has  been  defined  based  on  the  original  (noiseless)  image.  The  optimal 
binary  map  J°pt[-}  is  given  by 


jopt. 

J-,k 


[x,  y\ 


1,  when  \\.,k[x,y}\  >  an 

0,  else 


(18) 


where  A. ,*;[•]  are  the  wavelet  coefficients  of  the  original  (noiseless)  image  /(•),  and  an  is  the  standard  deviation 
of  the  noise  in  the  corrupted  image  /(•).  Thus,  the  extra  information  given  by  the  oracle  is  the  noiseless  wavelet 
coefficients,  A.  &[•].  The  coefficients  of  the  original  image  are  used  in  coefficient  selection  process,  but  not  in  the 
image  reconstruction.  The  coefficients  which  are  used  in  the  reconstruction,  Lop? [■],  are  given  by. 


roptr 


>y] 


A .,fc [x,  y] ,  when  .Jopt  [x,  y)  =  1 

0,  else 


(19) 


where  A.  ,*[-]  are  the  wavelet  coefficients  of  the  noisy  image. 

The  optimal  coefficient  map  is  used  to  create  the  optimal  denoised  image  which  is  given  by 


&°Cklx’V\  = 

3  E m  E n  h[m)hln\aiCk+1  [*  -  2fc+1"».  y  -  2 k+1n\ 

+  iEm  E„  /i[m]p[n]L°^fc+i  [x  -  2 fc+1m,  y  -  2 k+1n]  ■  (20) 

+  3  E m  E„  fl[m]/i[n]L^fc+i  [x  -  2 fc+1m,  y  -  2 k+1n] 

+  3  Em  En  9[m]g[n]L°hp^k+i  [x  -  2fc+1m,  y  -  2 k+1n] 

Equation  20  is  recursively  computed  for  lesser  values  of  k  until  the  optimal  denoised  image  is  achieved,  where 


fopt{x,y)  =  a°pl_1[x,y\.  (21) 

a°ptk[-}  are  the  optimal  scaling  coefficients,  and  fopt{-)  is  the  optimal  denoised  image.  Figure  5  gives  the  denoising 
results  of  the  optimal  denoising  method  when  applied  to  the  ’’Lenna”  image  corrupted  with  additive  white  Gaussian 
noise  (AWGN).  As  shown  in  Figure  5,  the  optimal  denoising  method  is  able  to  effectively  remove  the  noise  from 
the  ’’Fenna”  image  because  of  the  added  information  given  by  the  oracle.  PSNR  is  calculated  for  performance 
measurement  and  is  given  by 

/  955  \ 

PSNR  =  20  log10  -  ,  (22) 

\  ymse  J 

where 

mse  =  y)  ~  y ))  •  (23) 

J  J  x  y 

mse  is  the  mean-squared  error  between  the  original  image  /(•)  and  the  denoised  image  /(•),  and  Wf  and  Hf  are 
the  width  and  height  of  the  image,  respectively. 

It  is  rather  obvious  that  the  optimal  coefficient  selection  process  is  unattainable  since  no  supplemental  informa¬ 
tion  is  provided  by  the  oracle  for  corrupted  images.  Thus  the  optimal  image  denoising  method  is  not  possible  for 
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Fig.  5.  Optimal  denoising  method  applied  to  noisy  ’’Lenna”  image.  Left:  Corrupted  image  f(x,y),  crn  =  50,  PSNR  =  14.16  dB.  Right: 
Optimally  denoised  image  fopt(x,  y),  PSNR  =  27.72  dB. 


practical  implementation.  However,  the  knowledge  obtained  by  the  optimal  binary  map,  J°vk  [■],  is  used  to  compare 
with  the  refined  coefficient  map  generated  by  the  two-threshold  criteria,  described  in  Section  III.  The  coeffi¬ 

cient  selection  method  is  based  on  the  error  between  the  optimal  coefficient  subband  and  the  subband  generated  by 
the  two-threshold  criteria.  The  error  is  given  by 


-LJrrnr  — 


(JpPk[x’y]®JP,k[x,y])*l,k[x’V} 


E 


p€{hl  ,lh  ,hh}  ,k  ,x  ,y  p,k 


[x,y] 


(24) 


where  ®  is  the  exclusive  OR  operation. 

In  the  proposed  coefficient  selection  algorithm,  we  use  a  training  sample  approach.  The  approach  starts  with 
a  series  of  test  images  serving  as  training  samples  to  derive  the  functions  which  determine  the  optimal  set  of  the 
values  for  r  and  s  as  well  as  the  type  of  wavelet  used  for  denoising.  Theoretically,  we  may  represent  each  training 
sample  as  a  vector  V^i  =  1 .  n.  Those  training  samples  should  span  a  space  which  covers  more  corrupted  images 
than  the  training  samples: 

S  =  Span{T4;  *  =  1,  ..n}.  (25) 


The  original  data  and  the  statistical  distribution  of  the  noise  are  given  for  each  of  the  training  samples  which  are 
corrupted.  The  optimal  set  of  parameters  can  then  be  determined  for  the  training  samples  using  the  approach 
described  earlier.  Ideally,  the  space  spanned  by  the  training  samples  contains  the  type  of  the  corrupted  images 
which  are  to  be  denoised.  As  a  result,  the  same  set  can  generate  an  optimal  or  close  to  optimal  performance  for  the 
corrupted  images  of  same  type.  It  is  clear  that  more  training  samples  will  generate  parameters  suitable  for  more 
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types  of  images,  while  a  space  of  fewer  training  samples  is  suitable  for  a  less  types  of  images.  In  the  following,  we 
will  use  some  examples  to  illustrate  this  training  approach.  The  test  images  are  all  256x256  pixels.  Starting  from 


Fig.  6.  Test  images. 


the  upper-left  image  and  going  clockwise,  the  images  are  ’’Lenna”,” Airplane”,  ’’Fruits”,  and  "Girl”.  Each  of  the 
images  shown  in  Figure  6  is  well  known  in  the  image  processing  community,  and  collectively  represents  as  many 
kinds  of  images  as  possible.  In  this  way,  the  r  and  s  obtained  will  likely  perform  well  in  most  cases. 

A  test  is  used  to  demonstrate  the  effectiveness  of  different  wavelets  in  denoising.  First,  each  of  the  four  test 
images  is  corrupted  with  AWGN  at  various  levels.  Next,  the  2D  non-decimated  wavelet  transform,  given  in  Section 
II,  is  calculated  using  several  different  wavelets.  The  wavelet  coefficients  are  then  hard  thresholded  using  a  threshold 
T  ranging  from  0  —  150,  and  the  inverse  wavelet  transform  is  applied  to  the  thresholded  coefficients.  The  wavelet 
which  gives  the  reconstructed  images  with  the  highest  average  PSNR  is  chosen  to  be  used  in  the  general  case. 

Several  wavelets  were  used  in  the  testing.  However,  for  simplicity  only  five  are  presented.  We  have  chosen  the 
Daubechies  wavelets  [2]  (Daub4  and  Daub8)  for  their  smoothness  properties,  the  spline  wavelets  (first  order  and 
quadradic  spline)  [1]  because  of  their  use  in  the  previous  works  of  [14, 15, 19],  and  the  Haar  wavelet  because  of  its 
simplicity  and  compact  support.  The  results  are  given  in  Figure  7. 

After  the  testing  results  given  in  Figure  7,  the  Haar  wavelet  is  selected  for  image  denoising: 


i  ,  n  =  0,1 
h[n]  =  {  'ft 

0,  else 


y/5’  71  ~  0 

n=  1  • 


(26) 


0,  else 

Testing  has  shown  the  Haar  wavelet  to  be  the  most  promising  in  providing  the  highest  reconstructed  image  quality. 
The  compact  support  of  the  Haar  wavelet  enables  the  wavelet  coefficients  to  represent  the  least  number  of  original 
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0  50  100  150  0  50  100  150 

threshold  T  threshold  T 

denoising  using  different  wavelets,  on  =  30  denoising  using  different  wavelets,  an  =  40 


threshold  T  threshold  T 

Fig.  7.  Average  PSNR  Values  using  Different  Wavelets 

pixels  in  comparison  with  other  types  of  wavelets.  Therefore,  when  a  coefficient  is  removed  because  of  its  insignif¬ 
icance  or  isolation,  the  result  affects  the  smallest  area  of  the  original  image  in  the  reconstruction.  That  could  reduce 
the  impact  to  the  image  quality  even  if  a  removed  coefficient  is  not  only  comprised  of  noise. 

The  Haar  wavelet  is  used  in  a  non-decimated  wavelet  decomposition  of  the  original  image.  Three  subband  levels 
are  used,  i.e.  k  =  —  1  to  2.  The  proposed  selective  wavelet  shrinkage  algorithm  is  applied  to  all  wavelet  subbands, 
and  the  subbands  are  synthesized  by  the  non-decimated  inverse  wavelet  transform. 

Testing  for  the  optimal  values  of  r  and  s  is  accomplished  by  artificially  adding  Gaussian  noise  to  each  of  the  four 
images,  denoising  all  four  images  with  a  particular  r  and  s,  and  recording  the  average  error  given  by  Equation  24. 
Then,  the  combination  of  r  and  s  which  gives  the  lowest  error  is  the  choice  for  that  particular  noise  level. 

The  average  error  is  recorded  when  denoising  each  of  the  four  test  images  given  in  Figure  6  using  r  ranging  from 
0  —  150  and  s  ranging  from  0  —  20.  The  proposed  algorithm  is  tested  by  applying  AWGN  with  a  standard  deviation 
(<7„)  of  10,  20,  30,  40,  and  50  to  each  of  the  test  images.  The  proposed  method  of  selective  wavelet  shrinkage  is 
applied  to  the  corrupted  image,  and  the  resulting  error  is  recorded  using  Equation  24.  The  results  of  the  testing  in 
which  crn  =  30  is  given  in  Figure  8. 

Table  I  gives  the  r  and  s  which  provide  the  lowest  average  error  for  each  noise  level  tested.  These  particular 
values  are  referred  to  as  rm(-)  and  sm(-).  Table  I  suggests  that  parameters  rm(-)  and  sm(-)  are  functions  of  the 
standard  deviation  of  the  noise,  an. 

Another  set  of  images  were  also  tested  as  training  samples,  and  the  result  is  shown  in  Table  II.  One  can  see  that 
the  optimal  values  of  s  and  r  are  different  from  that  of  the  first  set  since  the  images  are  different.  In  general  the 
trained  s  and  r  should  do  well  for  a  variety  of  images  covered  by  the  space  which  is  spanned  by  the  sample  images 
because  the  errors  do  not  increase  quickly  as  s  and  r  vary.  This  is  shown  in  Figure  8  from  which  one  can  see  that 
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Error  Results  with  noise,  a  =  30 


Fig.  8.  Error  results  for  test  images,  an  =  30. 

TABLE  I 

Minimum  average  error  of  test  images  for  various  noise  levels  and  their  corresponding  threshold  and  support 

VALUES. 


Noise  Level(<r„) 

10 

20 

30 

40 

50 

Min.  Avg.  Error 

3e-4 

1  1e-4 

24e-4 

42e-4 

64e-4 

.srn  value 

5 

9 

10 

15 

14 

Tm  value 

23 

43 

63 

85 

108 

for  a  relatively  large  range  of  s  and  r,  the  error  is  close  to  minimal. 

Because  rm(-)  and  sm(-)  generally  increase  with  an  increase  in  additive  noise  as  shown  in  Table  I,  both  pa¬ 
rameters  can  be  modeled  as  functions  of  the  additive  noise,  crn.  Then,  knowing  the  level  of  noise  corruption,  the 
threshold  levels  which  produce  the  minimum  error,  Error,  may  be  obtained  by  estimating  the  rm(-)  and  sm(-) 
functions.  The  five  noise  levels  provided  in  the  test  are  used  as  sampling  points  for  the  estimation  of  the  continuous 
functions  rm(-)  and  sm(-).  With  enough  sampling  points  both  rm(-)  and  sm(-)  can  be  effectively  estimated,  and 
the  correct  r  and  s  can  be  calculated  to  denoise  an  image  with  any  level  of  noise  corruption,  given  that  the  noise 
level  is  known. 

The  estimated  functions  of  the  sampled  values  rm(-)  and  sm(-)  are  referred  to  as  rm(-)  and  sm(-),  respectively. 
Once  the  estimated  functions  are  calculated  they  are  used  in  the  general  case.  Thus,  given  an  image  corrupted  with 
noise,  it  is  denoised  with  no  prior  knowledge  by  estimating  the  level  of  noise  corruption,  calculating  the  proper 
thresholds  using  the  rm(-)  and  sm(-)  functions,  and  using  the  calculated  threshold  levels  in  the  denoising  process 
given  in  Section  III. 
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TABLE  II 

Minimum  average  error  of  second  set  of  test  images  for  various  noise  levels  and  their  corresponding  threshold 

AND  SUPPORT  VALUES. 


Noise  Level((j„) 

10 

20 

30 

40 

50 

Min.  Avg.  Error 

3e-4 

12e-4 

28e-4 

48e-4 

73e-4 

srn  value 

12 

16 

16 

17 

18 

Tm  value 

15 

31 

49 

69 

82 

V.  Estimation  of  Parameter  Values 

It  can  be  shown  from  the  values  given  in  Table  I  that  the  parameters  rm(-)  and  srn(-)  are  functions  of  an ; 
therefore,  we  need  to  estimate  the  standard  deviation  of  the  noise  level,  and  the  functions.  These  two  topics  are 
discussed  in  this  section.  Another  idea  for  selecting  the  two  parameters  is  to  use  the  signal-noise-ratio  (SNR)  of 
the  image.  Unfortunately,  the  SNR  information  for  a  noised  image  is  not  given,  and  very  hard  to  derive  if  not 
impossible  for  reasons  mentioned  earlier,  i.e.,  one  has  no  idea  about  the  level  of  the  original  signal,  and  has  to  use 
an  entirely  different  way  to  estimate  SNR  of  a  corrupted  image.  On  the  other  hand,  there  are  standard  procedures 
for  estimating  the  standard  deviation  of  the  noise  level,  one  of  which  is  shown  below. 


A.  Noise  Estimation 


The  level  of  noise  in  a  given  digital  image  is  unknown  and  must  be  estimated  from  the  noisy  image  data.  Several 
well  known  algorithms  have  been  given  in  the  literature  to  estimate  image  noise.  From  [4, 19]  a  median  value  of 
the  A*,/,  o[-]  subband  is  used  in  the  estimation  process.  The  median  noise  estimation  method  of  [19]  is  used. 


Median(\Xhh,o[-]\) 

0.6745 


(27) 


where  A/,/,  o[-]  arc  the  noisy  wavelet  coefficients  in  the  high-high  band  of  the  0th  scale.  Because  the  vast  majority 
of  useful  information  in  the  wavelet  domain  is  confined  to  few  and  large  coefficients,  the  median  can  effectively 
estimate  the  level  of  noise  (i.e.  the  average  level  of  the  useless  coefficients)  without  being  adversely  influenced  by 
useful  coefficients. 


B.  Parameter  Estimation 

Using  the  known  level  of  noise  added  to  the  original  images,  the  values  of  rm(-)  and  sm(-),  given  in  Table  I,  are 
estimated.  One  of  the  simplest  and  most  popular  estimation  procedures  is  the  the  LMMSE  (Linear  Minimum  Mean 
Squared  Error)  method,  and  it  is  used  as  the  estimation  procedure  [23],  That  is,  two  parameters  aT  and  bT  are  found 
such  that 

'fnniPn)  ~  ttTCtn  T  (28) 
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The  choice  of  aT  and  bT  will  minimize  the  mean  squared  error.  Similarly,  an  estimate  of  sm,  which  must  be  an 
integer,  is  found  as: 

fhn(o-n)  =  [as(Tn  +  bs  J .  (29) 

The  parameters  which  minimize  the  mean  squared  error  are:  aT  =  2.12,  bT  =  0.80,  as  =  0.26,  and  bs  =  2.81. 

The  LMMSE  estimation  procedure  gives  a  simple  description  of  the  rm  and  sm  functions.  That  is,  there  are  only 
two  values  needed  (a  and  b)  to  be  able  to  determine  the  proper  thresholds  for  denoising.  The  LMMSE  estimator 
also  is  shown  to  be  a  good  fit  into  the  test  data  given  in  Figure  9.  The  values  of  rm(-),  and  sm( •)  are  given  as  well 
as  their  corresponding  LMMSE  estimates.  The  LMMSE  estimate  functions  are  the  best  linear  fit  into  the  data.  Note 
that  the  support  value  sm  must  be  an  integer. 


140 1 — 

120  - 

100- 

80  - 

60  - 

40  - 

20  - 

0  — 
0 


Threshold  and  support  estimation  based  upon  noise  level 


Threshold  level  for  min.  error 
Threshold  estimate 


20  30  40 

Noise  level  (standard  deviation,  o) 


Local  support  value  for  min.  error 
Support  estimate _ 


20  30  40 

Noise  level  (standard  deviation,  o) 


Fig.  9.  rm(-),  sm(- )  and  their  corresponding  estimates,  rm(-),  sm(-). 


The  threshold  r  and  the  support  value  s  are  determined  by  using  the  estimate  of  the  noise  given  by  Equation  27. 
The  two  thresholds  are  given  by 


r  =  aTan  +  bT 
s  =  [  asOn  +  M 


(30) 


VI.  Experimental  Results 

The  ’’peppers”  and  ’’house”  images  are  used  for  gauging  the  performance  of  the  proposed  denoising  algorithm. 
These  two  images  have  also  been  used  in  the  results  of  [14, 15, 19].  Therefore,  the  performance  of  the  proposed 
algorithms  is  compared  with  other  recent  algorithms  given  in  the  literature.  Both  the  ’’peppers”  image  and  "house” 
image  are  corrupted  with  AWGN  and  used  the  proposed  method  for  denoising.  The  results  are  given  in  Figures  10 
and  11. 

Table  III  gives  the  results  of  the  proposed  method,  as  well  as  the  results  of  [14, 15, 19].  Note  that  the  methods  of 
[14, 15, 19]  all  use  the  quadratic  spline  wavelet  [1]  in  three  subband  levels,  and  each  of  the  algorithms’  coefficient 
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TABLE  III 

PSNR  COMPARISON  OF  THE  OF  THE  PROPOSED  METHOD  TO  OTHER  METHODS  IN  THE  LITERATURE  (results  given  in  dB). 


’’Peppers” 

Image  Input  PSNR 

22.6 

19.6 

16.6 

13.6 

Average 

Proposed  Algorithm 

31.00 

28.98 

27.17 

25.46 

28.15 

Pizurica  3-band,  [19] 

30.20 

28.60 

27.00 

25.20 

27.75 

Pizurica  2-band,  [19] 

29.90 

28.20 

26.60 

24.90 

27.40 

Malfait  and  Roose,  [14] 

28.60 

27.30 

26.00 

24.60 

26.63 

Mallat  and  Hwang,  [15] 

28.20 

27.30 

27.10 

24.60 

26.80 

Matlab’s  Sp.  Adaptive  Wiener 

29.00 

27.10 

25.30 

23.30 

26.18 

’’House” 

Image  Input  PSNR 

23.9 

20.9 

17.9 

14.9 

Average 

Proposed  Algorithm 

33.09 

31.55 

29.81 

28.34 

30.70 

Pizurica  3-band,  [19] 

32.80 

31.30 

29.80 

28.30 

30.55 

Pizurica  2-band,  [19] 

32.10 

30.50 

29.30 

28.10 

30.00 

Malfait  and  Roose,  [14] 

32.90 

31.30 

29.80 

28.20 

30.55 

Mallat  and  Hwang,  [15] 

31.30 

30.50 

29.10 

27.10 

29.50 

Matlab’s  Sp.  Adaptive  Wiener 

30.30 

28.60 

26.70 

24.90 

27.63 

selection  method  is  based  on  a  probabilistic  formulation  to  determine  the  amount  that  a  particular  coefficient  con¬ 
tributes  to  the  overall  image  quality.  The  proposed  algorithm  uses  the  Haar  wavelet,  given  in  Equation  26  in  three 
subband  levels,  and  the  coefficient  selection  process  is  based  on  a  geometrical  approach.  As  shown  in  Table  III,  the 
results  of  the  proposed  method  are  an  improvement  over  other  methods  described  in  the  literature.  In  addition  to 
improved  performance,  the  proposed  algorithm  is  computationally  simple  to  facilitate  real-world  applications.  The 
proposed  algorithm  has  been  computed  on  older  processors  for  an  accurate  comparison,  and  the  computation  time 
of  the  proposed  method  is  an  order  of  magnitude  less  than  the  previous  method  of  highest  performance,  [19].  Table 
IV  gives  the  computational  results  of  the  proposed  method  as  well  as  the  results  of  [14, 19], 

The  proposed  algorithm  shows  a  substantial  drop  in  computation  time.  Both  [14]  and  [19]  used  iterative  compu¬ 
tation  in  the  selection  of  wavelet  coefficients  for  reconstruction  which  requires  unreasonable  computation  time  for 
certain  applications.  The  current  two-threshold  technique  is  a  simpler,  non-iterative  coefficient  selection  method 
which  produces  greater  performance  results. 


VII.  Conclusions 

In  this  paper,  a  new  selective  wavelet  shrinkage  algorithm  for  image  denoising  has  been  described.  The  proposed 
algorithm  uses  a  two-threshold  support  criteria  which  investigates  coefficient  magnitude,  spatial  support,  and  sup- 
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TABLE  IV 

Computation  times  for  a  256x256  image,  in  seconds. 


Processor 

Pentium  IV 

Pentium  III 

IBM  RS6000/320H 

Proposed  Algorithm 

0.66 

1.14 

*** 

Pizurica  3 -band,  [19] 

*** 

45.00 

*** 

Pizurica  2-band,  [19] 

*** 

30.00 

*** 

Malfait  and  Roose,  [14] 

*** 

*** 

180.00 

***  Computation  time  not  evaluated 


port  across  scales  in  the  coefficient  selection  process.  In  general,  images  can  be  accurately  represented  by  a  few 
large  wavelet  coefficients,  and  those  few  coefficients  are  spatially  clustered  together.  The  two-threshold  criteria  is  an 
efficient  and  effective  way  of  using  the  magnitude  and  spatial  regularity  of  wavelet  coefficients  to  distinguish  useful 
from  useless  coefficients.  Furthermore,  the  two-threshold  criteria  is  a  non-iterative  solution  to  selective  wavelet 
shrinkage  to  provide  a  computationally  simple  solution,  facilitating  realtime  image  processing  applications. 

The  values  of  the  two-thresholds  are  determined  by  minimizing  the  error  between  the  coefficients  selected  by  the 
two-thresholds  and  the  coefficients  selected  by  a  denoising  method  which  uses  supplemental  information  provided 
by  an  oracle.  The  supplemental  information  provided  by  the  oracle  is  useful  in  determining  the  correct  coefficients 
to  select,  and  the  denoising  performance  is  substantially  greater  than  methods  which  do  not  use  the  supplemental 
information.  Thus,  the  method  which  uses  the  supplemental  information  provided  by  the  oracle  is  referred  to  as  the 
optimal  denoising  method.  Therefore,  by  minimizing  the  error  between  the  two-threshold  method  and  the  optimal 
denoising  method,  the  two-threshold  method  can  come  as  close  as  possible  to  the  performance  of  the  optimal 
denoising  method. 

Consequently,  the  two-threshold  method  of  selective  wavelet  shrinkage  provides  an  image  denoising  algorithm 
which  is  superior  to  previous  image  denoising  methods  given  in  the  literature  both  in  denoised  image  quality  and 
computation  time.  The  light  computational  burden  of  the  proposed  denoising  method  makes  it  suitable  for  real-time 
image  processing  applications. 


VIII.  Appendix 

The  computation  of  S.^[ x,  y\  is  given  by  the  following  algorithm: 
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N()  =  {[-1,  -1],  [-1, 0],  [-1, 1],  [0,  -1],  [0, 1],  [1,  -1],  [1,0],  [1, 1]} 

O[-]=0,  t  =  0,  p  =  0,  D.tk(0)  =  (x,y) 
if  I;k\x,y\  ==  1, 

while  D.,k(t)  7^  NULL, 

(■ i,j )  =  D;k{t) 
t  =  t  - hi 

for  m  =  0  to  7, 

if  ((I-A(iJ)  +  N(m)\  ==  1) 

and  ( 0[(i,j )  +  N(m)]  ==  0)), 
p  =  p  +  1 

D.,k(p)  =  (( i,j )  +  N(m)) 

0[(i,j)  +  N  (to)]  =  1, 
end  if 
end  for 
end  while 
end  if 

S.,k[x,y]  =  t 

0[x,  y]  is  a  binary  value  to  determine  whether  a  particular  I.  k\x,  y\  value  has  been  counted  previously.  D  is  an 
array  of  spatial  coordinates  of  valid  coefficients  that  support  the  current  coefficient  I._k[x,  y],  N  is  a  set  of  vectors 
corresponding  to  neighboring  coefficients. 
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Fig.  11.  Results  of  the  proposed  image  denoising  algorithm.  Top  left;  original  ’’house”  image.  Top  right;  corrupted  image,  cr ? 
=  17.90  dB.  Bottom;  denoised  image  using  the  proposed  method,  PSNR  =  29.81  dB. 
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Abstract 

A  combined  spatial  and  temporal  domain  wavelet  shrinkage  algorithm  for  video  denoising  is  pre¬ 
sented  in  this  paper.  The  spatial  domain  denoising  technique  is  a  selective  wavelet  shrinkage  method 
which  uses  a  two-threshold  criteria  to  exploit  the  geometry  of  the  wavelet  sub-bands  of  each  video 
frame,  and  each  frame  of  the  image  sequence  is  spatially  denoised  independently  of  one  another.  The 
temporal  domain  denoising  technique  is  a  selective  wavelet  shrinkage  method  which  estimates  the  level 
of  noise  corruption  as  well  as  the  amount  of  motion  in  the  image  sequence.  The  amount  of  noise  is 
estimated  to  determine  how  much  filtering  is  needed  in  the  temporal  domain,  and  the  amount  of  mo¬ 
tion  is  taken  into  consideration  to  determine  the  degree  of  similarity  between  consecutive  frames.  The 
similarity  affects  how  much  noise  removal  is  possible  using  temporal  domain  processing.  Using  motion 
and  noise  level  estimates,  a  video  denoising  technique  is  established  which  is  robust  to  various  levels  of 
noise  corruption  and  various  levels  of  motion. 

Keywords — video  denoising,  combined  spatial  and  temporal  domain  processing,  selective  wavelet 
shrinkage,  motion  estimation. 


I.  Introduction 

The  recent  advance  in  multimedia  technology  has  promoted  a  large  amount  of  research  in  the  area  of 
image  and  video  processing.  Included  in  many  image  and  video  processing  applications  which  include 
compression,  enhancement,  and  target  recognition  are  preprocessing  functions  for  noise  removal.  Noise 
removal  is  one  of  the  most  common  and  important  processing  steps  in  many  image  and  video  systems. 

Because  of  the  commonality  of  noise  removal  functions  in  most  image  and  video  systems,  there  has 
been  an  large  amount  of  research  dedicated  to  the  subject  of  image  denoising  over  the  past  several  decades, 
and  many  different  mathematical  tools  have  been  proposed.  Various  established  denoising  methods  using 
variable  coefficient  linear  filters  [5,  18,22,29],  adaptive  nonlinear  filters  [10,  17,  19,30],  DCT  based 
solutions  [12],  cluster  filtering  [28],  genetic  algorithms  [27],  fuzzy  logic  [13,  24],  etc.,  have  all  been 
proposed  in  the  literature. 

The  wavelet  transform  has  also  been  used  to  suppress  noise  in  digital  images.  It  has  been  shown  that 
the  reduction  in  absolute  value  of  wavelet  coefficients  is  successful  in  signal  restoration  [16].  This  process 
is  known  as  wavelet  shrinkage.  Other  denoising  techniques  select  or  reject  wavelet  coefficients  based  on 
their  predicted  contribution  to  reconstructed  image  quality.  This  process  is  known  as  selective  wavelet 
shrinkage,  and  many  works  have  used  it  as  the  preferred  method  of  image  denoising  [1,4, 6,7,9, 11, 14- 
16,20,26], 

However  until  recently,  the  removal  of  noise  in  video  signals  has  not  been  studied  seriously.  Cocchia, 
et.  al.,  developed  a  three  dimensional  rational  filter  for  noise  removal  in  video  signals  [3],  The  3D 
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rational  filter  removes  noise,  but  also  preserves  important  edge  information.  Also,  the  3D  rational  filter 
uses  a  motion  estimation  technique.  Where  there  is  no  motion  detected,  the  3D  rational  filter  is  applied 
in  the  temporal  domain.  Otherwise,  only  spatial  domain  processing  is  applied. 

Later,  Zlokolica,  et.  al.,  uses  two  new  techniques  for  noise  removal  in  image  sequences  [30].  Both 
these  new  techniques  show  improved  results  upon  the  method  of  [3],  The  first  method  is  an  alpha- 
trimmed  mean  filter  of  [2]  extended  to  video  signals,  and  the  second  is  the  K  nearest  neighbors  (KNN) 
filter.  Both  alpha-trimmed  and  KNN  denoising  methods  arc  based  on  ordering  the  pixel  values  in  the 
neighborhood  of  the  location  to  be  filtered,  and  averaging  a  portion  of  those  spatially  contiguous  pixels. 
Each  of  these  methods  attempts  to  average  values  which  arc  close  in  value,  and  avoid  averaging  values 
which  arc  largely  dissimilar  in  value.  Thus,  the  image  sequence  is  smoothed  without  blurring  edges,  or 
smearing  motion. 

However,  because  the  success  of  the  wavelet  transform  over  other  mathematical  tools  in  denoising 
images,  some  researchers  believe  that  wavelets  may  be  successful  in  the  removal  of  noise  in  video  signals 
as  well.  Pizurica,  et.  al.,  uses  a  wavelet-based  image  denoising  method  to  remove  noise  from  each 
individual  frame  in  an  image  sequence  and  then  applies  a  temporal  filtering  process  for  temporal  domain 
noise  removal  [21].  The  combination  of  wavelet  image  denoising  and  temporal  filtering  outperforms 
both  wavelet  based  image  denoising  techniques  [1, 15, 16,20]  and  spatial-temporal  filtering  techniques 
[2,3,30], 

The  temporal  domain  filtering  technique  described  in  [21]  is  a  linear  HR  filter  which  will  continue 
to  filter  until  it  reaches  a  large  temporal  discontinuity.  It  will  not  filter  the  locations  of  large  temporal 
discontinuity  where  the  absolute  difference  in  neighboring  pixel  values  is  greater  than  a  threshold,  T, 
thus  preserving  motion  while  removing  noise. 

Although  temporal  processing  aids  in  the  quality  of  the  original  image  denoising  method,  the  parameter 
T  varies  with  differing  video  signals  for  improved  performance.  That  is,  proper  selection  of  T  may  be 
large  in  sequences  where  there  is  little  motion  for  improved  noise  removal,  i.e.,  there  is  more  redundancy 
between  consecutive  frames.  Thus  the  redundancy  may  be  exploited  by  a  large  T  to  improve  video 
quality.  However,  in  image  sequences  where  there  exists  a  large  amount  of  motion,  consecutive  frames 
are  more  independent  and  there  exists  little  to  no  redundancy  to  exploit.  Thus,  the  parameter  T  must  be 
small  to  achieve  optimal  performance. 

In  the  case  of  video  denoising,  it  has  been  fairly  well  documented  that  the  amount  of  noise  removal 
achievable  from  temporal  domain  processing,  while  preserving  overall  quality,  is  dependent  on  the  amount 
of  motion  in  the  original  video  signal  [3,21].  Thus,  a  robust,  high-quality  video  denoising  algorithm  is  re¬ 
quired  to  not  only  be  scalable  to  differing  levels  of  noise  corruption,  but  also  scalable  to  differing  amounts 
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of  motion  in  the  original  signal.  Unfortunately,  this  principle  has  not  been  seriously  considered  in  video 
denoising. 

In  this  paper,  we  develop  a  noise  removal  algorithm  for  video  signals.  This  algorithm  uses  selective 
wavelet  shrinkage  in  all  three  dimensions  of  the  image  sequence  and  proves  to  outperform  the  few  video 
denoising  algorithms  given  in  the  relevant  literature.  First,  the  individual  frames  of  the  sequence  are 
denoised  by  the  method  of  [1],  which  we  had  developed  earlier.  Then  a  new  selective  wavelet  shrinkage 
method  is  used  for  temporal  domain  processing. 

Also,  a  motion  estimation  algorithm  is  developed  to  determine  the  amount  of  temporal  domain  pro¬ 
cessing  to  be  performed.  Several  motion  estimators  have  been  proposed  [3,21],  but  few  arc  robust  to 
noise  corruption.  The  proposed  motion  estimation  algorithm  is  robust  to  noise  corruption  and  an  im¬ 
provement  over  the  motion  estimation  method  of  [3],  The  proposed  denoising  algorithm,  including  the 
proposed  motion  estimation  method,  is  experimentally  determined  to  be  an  improvement  over  the  meth¬ 
ods  of  [3,21,30]. 

Following  the  Introduction,  Section  II  gives  a  brief  description  of  the  image  denoising  method  of  [1], 
used  as  the  spatial  denoising  method  in  the  proposed  video  denoising  algorithm.  Section  III  describes  the 
temporal  domain  wavelet  shrinkage  method  and  explores  the  proper  order  of  temporal  and  spatial  domain 
processing  functions.  Section  IV  provides  the  proposed  motion  estimation  index  used  in  the  temporal 
domain  processing  and  compares  it  with  the  motion  estimation  method  of  [3],  Section  V  develops  the 
parameters  for  temporal  domain  processing,  and  Section  VI  gives  the  experimental  results  of  the  proposed 
method  as  well  as  other  established  methods.  Section  VII  concludes  the  paper. 

II.  Spatial  Domain  Denoising  Technique 

The  proposed  video  denoising  technique  uses  the  selective  wavelet  shrinkage  algorithm  of  [1]  for 
denoising  of  the  individual  frames  of  the  image  sequence.  A  brief  review  of  the  algorithm  is  included  in 
this  section  for  completeness. 

A.  The  Coefficient  Selection  Method 

First,  we  will  review  the  proposed  coefficient  selection  method  of  [1],  The  coefficient  selection  method 
is  based  on  a  two-threshold  criteria,  selecting  wavelet  coefficients  with  large  magnitude  and  spatial  regu¬ 
larity. 

Assume  that  an  image  signal  is  corrupted  with  additive  noise,  i.e., 

=  +  ),  (1) 
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where  f(l)  is  the  noiseless  image  pixel  of  position  l,  rj(l)  is  a  random  noise  function,  and  f(l)  is  the 
corresponding  corrupted  signal. 

The  wavelet  shrinkage  algorithm  takes  the  non-decimated  2D  wavelet  transform  of  /(/),  and  selects 
the  wavelet  coefficients  for  denoising.  The  first  step  for  selecting  the  wavelet  coefficient  is  to  find  a 
binary  label  for  each  coefficient  which  collectively  forms  a  binary  map.  The  binary  map  is  then  used  to 
determine  whether  or  not  a  particular  wavelet  coefficient  is  included  in  a  regular  spatial  feature.  The  non- 
decimated,  2D  wavelet  transform  of  f(l)  generates  coefficients,  Amy.[t]  of  spatial  location  l,  resolution  k, 
and  subband  m  £  {Ih,  hi,  hh}.  The  subband  designation  m  denotes  the  low-high  (Ih),  high-low  (hi),  and 
high-high  (hh)  subbands.  For  example,  the  Ih  subband  is  produced  by  convolving  the  input  function  with 
the  low-pass  scaling  filter,  h[-\,  in  the  horizontal  dimension  then  convolving  the  result  with  the  high-pass 
wavelet  filter,  g[-},  in  the  vertical  dimension.  A m,k[l]  is  used  to  create  the  preliminary  binary  label,  Im,k[l\- 


Im,k  [^] 


1,  when  |Am;fc[/]|>r 

0,  else 


(2) 


where  r  is  a  threshold  for  selecting  valid  coefficients  in  the  construction  of  the  binary  coefficient  map.  A 
valid  coefficient  is  defined  as  a  coefficient,  Am./,.[/],  which  results  in  l,n  j-[l]  =  1;  hence  the  coefficient  has 
been  selected  due  to  its  magnitude.  After  coefficients  are  selected  by  magnitude,  spatial  regularity  is  used 
to  further  examine  the  role  of  the  valid  coefficient',  whether  it  is  isolated  noise  or  pail  of  a  spatial  feature. 
The  number  of  supporting  binary  values  around  a  particular  non-zero  value  Irn  j-[l]  is  used  to  make  the 
judgement.  The  support  value,  ,Sj [1],  is  the  sum  of  all  Imj. [/]  which  support  the  current  binary  value 
Im,k[l]  \  that  is,  the  total  number  of  all  valid  coefficients  which  are  spatially  connected  to  Im,k[l\- 

A  coefficient  is  spatially  connected  to  another  if  there  exists  a  continuous  path  of  valid  coefficients 
between  the  two.  Figure  1  gives  a  generic  coefficient  map.  The  valid  coefficients  are  highlighted  in  gray. 
From  Figure  1  it  can  be  shown  that  coefficients  A,  B,  C,  and  H  do  not  support  any  other  valid  coefficients 
in  the  coefficient  map.  However,  coefficients  D  and  F  support  each  other,  coefficients  E  and  G  support 
each  other,  and  N  and  O  support  each  other.  Also,  coefficients  I,  J,  K,  L,  M,  P,  Q,  and  R  all  support  one 
another.  Figure  2  gives  the  value  of  Srnj,[l]  for  each  of  the  valid  coefficients  given  in  Figure  1.  A  method 
of  computing  Sm>k[l]  is  given  in  [1].  Srnj-\l}  is  used  to  refine  the  original  binary  map  Im,k[l]  by 


J-,k[x,y\ 


1,  when  S.tk[x,y\  >  s, 

or  J.'k+i[x,y]I.)k[x,y\  =  1 


0,  else 


(3) 


where  Jm,k[l\  is  the  refined  coefficient  map,  and  s  is  the  necessary  number  of  support  coefficients  for 
selection.  Jv[-]  is  calculated  recursively,  stalling  from  the  highest  multiresolution  level,  and  progressing 
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Fig.  1 .  Generic  coefficient  array 


Fig.  2.  Generic  coefficient  array,  with  corresponding  S.,k  values 

downward. 

Equation  3  is  equal  to  one  when  there  exists  enough  wavelet  coefficients  of  large  magnitude  around 
the  current  coefficient.  However,  it  also  retains  coefficients  in  which  the  magnitude  of  the  coefficient  is 
effectively  large  (Im.k[l]  =  1)  but  not  locally  supported  (Jrn,k[l]  =  0)  only  if  the  coefficient  of  the  larger 
scale  is  large  and  locally  supported  (Jm,k+1  [l]  =  !)■  The  decision  to  use  this  criterion  is  in  the  somewhat 
rare  case  when  a  useful  coefficient  is  not  locally  supported.  In  the  general  case,  wavelet  coefficients  of 
images  are  clustered  together,  but  rarely  they  arc  isolated.  In  [16],  wavelet  coefficients  arc  modified  only 
by  their  evolution  across  scales.  Regular  signal  features  contain  wavelet  coefficients  which  increase  with 
increasing  scale.  Thus,  if  there  exists  a  useful  coefficient  which  is  isolated  in  an  image,  it  is  reasonable 
that  a  coefficient  in  the  same  spatial  location  of  an  increase  in  scale  will  be  sufficiently  large  and  spatially 
supported.  Thus,  the  coefficient  selection  method  provided  by  Equation  3  selects  coefficients  which  are 
sufficiently  large  and  locally  supported  as  well  as  isolated  coefficients  which  arc  sufficiently  large  and 
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supported  by  scale. 

This  type  of  scale-selection  is  consistent  with  the  findings  of  Said  and  Pearl  man  [23],  who  developed  an 
image  codec  based  on  a  ’’spatial  self-symmetry”  between  differing  scales  in  wavelet  transformed  images. 
They  discovered  that  most  of  an  images  energy  is  concentrated  in  the  low-frequency  subbands  of  the 
wavelet  transform.  And  because  of  the  self-symmetry  properties  of  wavelet  transformed  images,  if  a 
coefficient  value  is  insignificant  (i.e.,  of  small  value  or  zero),  then  it  can  be  assumed  that  the  coefficients 
of  higher  spatial  frequency  and  same  spatial  location  will  be  insignificant.  In  our  application,  however,  we 
arc  looking  for  significance  rather  than  insignificance,  so  we  look  to  the  significance  of  lower  frequency 
coefficients  to  determine  significance  of  the  current  coefficient.  In  this  way,  the  preliminary  binary  map 
is  refined  by  both  spatial  and  scalar  support,  given  by  Equation  3. 

The  final  coefficients  retained  for  reconstruction  arc  given  by 

r  r  ,  J  A.jfc[x,y],  when  Jk[x,y]  =  1 

LMx^y\  =  \  •  (4) 

I  0,  else 

The  denoised  image  is  reconstructed  by  synthesizing  the  supported  wavelet  coefficients,  Lmk[l]  using 
the  non-decimated  inverse  wavelet  transform. 

In  general,  natural  and  synthetic  imagery  can  be  compactly  represented  in  few  wavelet  coefficients  of 
large  magnitude.  These  coefficients  are  in  general  spatially  clustered.  Thus,  it  is  useful  to  obtain  selection 
methods  based  on  magnitude  and  spatial  regularity  to  distinguish  between  useful  coefficients  which  are 
representative  of  the  image  and  useless  coefficients  representative  of  noise.  The  two-threshold  criteria 
for  the  rejection  of  noisy  wavelet  coefficients  is  a  computationally  simple  test  for  magnitude  and  spatial 
regularity  which  can  effectively  distinguish  between  useful  and  useless  coefficients. 


B.  Determining  the  r  and  s  Thresholds 

In  determining  the  optimal  threshold  values,  it  is  found  that  both  thresholds  arc  a  function  of  the  noise 
standard  deviation,  an  [1].  Therefore, 

r  =  aTdn  +  hT  (5) 


and 


s  =  [asan  +  hs  J, 


(6) 


where  an  is  an  estimate  of  the  noise,  aT  =  2.12,  bT  =  0.80,  as  =  0.26  and  bs  =  2.81.  The  estimate  of 
the  noise  is  taken  from  that  of  [20]  and  is  given  by 


Median(\Xhhfi[-}\) 

0.6745 


(V) 
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where  A/,/Ko[-]  arc  the  noisy  wavelet  coefficients  of  the  0th  level  and  hh  subband.  For  a  more  detailed 
treatment  of  the  proposed  spatial  denoising  method,  refer  to  [1]. 

III.  Temporal  Denoising  and  Order  of  Operations 

In  this  section,  we  develop  the  principal  algorithm  for  video  denoising.  Additional  mechanisms  re¬ 
quired  by  this  algorithm  will  be  discussed  in  latter  sections. 


A.  Temporal  Domain  Denoising 

Let  us  define  ff  as  a  pixel  of  spatial  location  /  and  frame  z  in  a  given  image  sequence.  The  non- 
decimated  wavelet  transform  applied  in  the  temporal  domain  is  given  by 


and 


where 


Af+iM  =  ^Zg[pWkD[h^k+1p-  z], 

p 

«*+ 1[*.  z\  =  Yl  h\p]akD[i> 2  k+1p  -  z\, 

p 

a3D1[l,z]  =  flz. 


(8) 


(9) 

(10) 


AfM  is  the  high-frequency  wavelet  coefficient  of  spatial  location  l,  frame  z  and  scale  k.  Also, 
afD[l.  z\  is  the  low-frequency  scaling  coefficient  of  spatial  location  l,  frame  z  and  scale  k.  Thus,  multiple 
resolutions  of  wavelet  coefficients  may  be  generated  from  iterative  calculation  of  Equations  8  and  9. 

The  wavelet  function  used  in  the  temporal  domain  denoising  process  is  the  Haar  wavelet  given  by 


h[n] 


n  =  0,l 


V2 

0 


else 


9  M 


-1 

V2' 

1 

y/2’ 

0, 


n  =  0 


n  =  1. 
else 


(11) 


The  decision  to  use  the  Haar  wavelet  is  based  on  experimentation  with  several  other  wavelet  functions  and 
finding  the  greatest  results  with  the  Haar.  The  compact  support  of  the  Haar  wavelet  makes  it  a  suitable 
function  for  denoising  applications.  Because  of  it’s  compact  support,  the  Haar  coefficients  represent  least 
number  of  original  pixels  in  comparison  to  other  types  of  wavelets.  Thus,  when  a  coefficient  is  removed 
because  of  its  insignificance,  the  result  affects  the  smallest  area  of  the  original  signal  in  the  reconstruction. 

Significant  wavelet  coefficients  arc  selected  by  their  magnitude  with  a  threshold  operation. 

A lD [l,  z\,  when  |  A \D [l,  z\\  >  tz 

0,  else 
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where  L^P[l,z]  arc  the  thresholded  wavelet  coefficients  used  in  signal  reconstruction,  and  tz[-\  is  the 
threshold  value.  The  resulting  denoised  video  signal  is  computed  via  the  inverse  non-decimated  wavelet 


transform 

<*1DM  =  2£pMp]a|?iM-2t+1f>] 
+3  £p  <*]£!?!  M-2*+V]  ' 

which  leads  to 

fr3D  =  a3D1[l,z}. 


fl’3D  is  the  temporally  denoised  video  signal. 


(13) 

(14) 


B.  Order  of  operations 


With  a  spatial  denoising  technique  and  a  temporal  denoising  technique  established  in  Sections  II  and 
above,  respectively,  there  still  remains  the  question  of  the  order  of  operations.  The  highest  quality  may 
occur  with  temporal  domain  denoising  followed  by  spatial  domain  (TFS)  denoising,  or  spatial  denoising 
followed  by  temporal  (SFT)  denoising. 

Theoretically,  is  it  not  possible  to  prove  and  determine  which  operation  is  better  because  the  description 
of  the  noise  is  not  known.  However,  it  is  our  hypothesis  that  SFT  denoising  can  more  aptly  determine 
noise  from  signal  information.  The  reasoning  behind  this  hypothesis  is  that  removing  noise  in  the  spatial 
domain  is  a  well  known  process,  and  any  noise  removal  prior  to  temporal  domain  processing  is  helpful 
in  discriminating  between  the  residual  noise  and  motion  in  the  image  sequence.  However,  a  validation  of 
this  hypothesis  is  determined  heuristically. 

Thus,  a  simple  test  is  conducted  with  two  test  video  signals.  The  first  video  signal  is  one  which 
contains  little  motion,  and  the  other  contains  a  great  deal  of  motion.  The  selected  image  sequences  are 
the  ’’CLAIRE”  sequence  from  frame  #104-167  and  the  ’’FOOTBALL”  sequence  from  frame  #33-96. 

Both  of  the  image  sequences  arc  denoised  with  r  and  r,  ranging  from  0  —  30  for  both  TFS  and  SFT 
denoising  operations.  Note  that  in  the  test,  r2  is  a  single  value  and  spatially  independent,  unlike  the 
temporal  threshold  used  in  the  final  denoising  algorithms  tz[-\  which  is  dependent  upon  spatial  position. 
Also,  the  s  parameter  for  feature  selection  in  the  image  denoising  method  described  in  Section  II  is 
calculated  by  taking  Equations  5  and  6  and  solving  for  s.  The  parameter  s  is  given  by: 

s=  —  (r  —  bT)  +  bs  .  (15) 

CLq- 

Also,  the  number  of  resolutions  of  the  non-decimated  wavelet  transform  used  in  both  the  spatial  and 
temporal  denoising  methods  is  k  =  1...5.  The  average  PSNR  of  each  trial  is  recorded.  The  PSNR  of  an 


image  is  given  by: 


PSNRZ 


20log±o 


(16) 
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where 

msez  =  (ft  ~  ft)  •  (17) 

i 

L  is  the  size  of  the  image,  ff  is  the  denoised  pixel  of  spatial  location  l  and  frame  z,  and  ff  is  the 
corresponding  pixel  of  the  original  signal. 

Figure  3  gives  the  results  of  testing.  As  shown  in  Figure  3,  the  highest  average  PSNR  is  achieved 


FOOTBALL  Image  Sequence.  SFT  Denoising.  FOOTBALL  Image  Sequence.  TFS  Denoising. 


XX  xx 

z  z 


CLAIRE  Image  Sequence.  SFT  Denoising.  CLAIRE  Image  Sequence.  TFS  Denoising. 


Fig.  3.  Test  results  of  both  TFS  and  SFT  denoising  methods.  Upper  left:  FOOTBALL  image  sequence,  SFT  denoising,  max. 
PSNR  =  30.85,  t  =  18,  rz  =  12.  Upper  right:  FOOTBALL  image  sequence,  TFS  denoising,  max.  PSNR  =  30.71,  r  =  18, 
tz  =  12.  Lower  left:  CLAIRE  image  sequence,  SFT  denoising,  max.  PSNR  =  40.77,  r  =  19,  tz  =  15.  Lower  right:  CLAIRE 
image  sequence,  TFS  denoising,  max.  PSNR  =  40.69,  r  =  15,  rz  =  21. 

by  SFT  denoising;  first  spatially  denoising  each  frame  of  the  sequence  followed  by  temporal  domain 
denoising.  Thus,  for  the  proposed  denoising  method,  spatial  domain  denoising  occurs  prior  to  temporal 
domain  denoising,  exclusively. 

In  addition  to  a  higher  average  PSNR,  there  is  another  benefit  to  SFT  denoising.  The  level  of  motion 
in  an  image  sequence  is  known  to  be  crucial  in  determining  the  amount  of  noise  reduction  possible  from 

March  22,  2004 


46 


11 


temporal  domain  processing,  and  a  motion  index  calculation  is  inevitably  done  by  comparing  consecutive 
frames  to  one  another.  Thus,  let  us  define  a  noisy  image  sequence  where  is  a  corrupted  pixel  in  spatial 
position  l  and  frame  z  and  is  defined  by 

fl=fl+rtf,  (18) 

where  is  the  noiseless  pixel  value,  and  rjf  is  the  noise  function.  We  can  compare  consecutive  frames 
by  taking  the  difference  as  in  [3,21]  to  find 

fi  ~  fi+1  =  A/f  +  Arjf .  (19) 

Thus  by  taking  the  difference  between  frames  to  find  the  level  of  motion,  the  noise  function  is  subtracted 
from  itself,  in  effect  doubling  the  amount  of  noise  corruption  [25].  Therefore,  by  applying  spatial  de¬ 
noting  prior  to  motion  index  calculation  we  can  reduce  the  value  of  A  rjf  and  provide  a  more  precise 
calculation  of  the  motion  given  in  the  image  sequence. 

IV.  Proposed  Motion  Index 

A  motion  index  is  important  in  the  success  of  a  video  denoising  method  in  order  to  discriminate  be¬ 
tween  large  temporal  variances  in  the  video  signal  which  arc  caused  by  noise  and  large  temporal  variances 
which  arc  caused  my  motion  in  the  original  (noiseless)  signal.  A  motion  index  is  able  to  aid  temporal 
denoising  algorithms  to  eliminate  the  large  temporal  variances  caused  by  noise  while  preserving  the  tem¬ 
poral  variances  caused  by  motion  in  the  original  image  sequence,  creating  a  higher  quality  video  signal. 
That  is,  the  motion  index  is  used  to  determine  tz[-\. 

A.  Motion  Index  Calculation 

Several  works  have  developed  a  motion  estimation  index  to  determine  the  amount  of  temporal  domain 
processing  to  perform,  i.e.,  the  amount  of  information  that  can  be  removed  from  the  original  signal  to  im¬ 
prove  the  overall  quality  [3,21].  However,  none  of  these  proposed  indices  arc  robust  to  noise  corruption, 
which  is  an  important  feature  in  a  motion  index.  There  arc  a  few  characteristics  that  a  motion  index  must 
possess.  One,  a  motion  index  should  be  a  localized  value.  The  reasoning  behind  a  localized  motion  index 
is  because  the  amount  of  motion  may  vary  in  different  spatial  portions  of  an  image  sequence.  Thus  the 
motion  index  should  be  able  to  identify  those  differences.  Two,  a  motion  index  needs  to  be  unaffected 
by  the  amount  of  noise  corruption  in  a  given  video  signal.  A  motion  index  should  be  robust  to  noise 
corruption  to  aptly  determine  the  proper  amount  of  temporal  domain  processing. 

Thus,  a  localized  motion  index  is  developed  which  is  relatively  unaffected  by  the  level  of  noise  corrup¬ 
tion  in  the  original  image  sequence.  A  spatially  averaged  temporal  standard  deviation  (SATSD)  is  used  as 
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the  index  of  motion.  Spatial  averaging  is  used  to  remove  the  noise  inherent  in  the  signal,  and  the  temporal 
standard  deviation  is  used  to  detect  the  amount  of  activity  in  the  temporal  domain. 

Let  us  define  fi'2D  as  pixel  value  in  the  spatial  location  I  of  the  zth  frame  of  an  image  sequence  already 
processed  by  the  2D  denoising  method  of  [1].  The  spatial  averaging  of  the  spatially  denoised  signal  is 
given  by 


At  =  £  /. 

i£l 


z, 2D 


(20) 


where  I  is  the  set  of  spatial  locations  which  form  a  square  area  centered  around  spatial  location  l,  and 
B2  is  the  number  of  spatial  locations  contained  in  /;  typically,  B  =  15.  The  value  of  B  must  be  an  odd 
value  to  allow  for  the  square  area  to  set  centrally  around  spatial  location  l.  This  average  is  used  to  find 
the  standard  deviation  in  the  temporal  domain. 


£4, 


(21) 


and 


Mj  = 


N 


(22) 


i=  1 


Mi  is  the  localized  motion  index,  F  is  the  number  of  frames  in  the  image  sequence,  and  m  is  the  temporal 
mean  of  the  spatial  average  at  location  l. 


B.  Motion  Index  Testing 

The  FOOTBALL  and  CLAIRE  image  sequences  are  used  once  more  to  test  the  proposed  motion  index 
as  well  as  the  motion  index  given  in  [3],  and  two  specific  spatial  locations  are  selected  from  each  sequence: 
a  location  where  there  is  little  to  no  motion  present,  and  a  location  where  motion  is  present.  A  frame  from 
each  of  the  two  image  sequences  is  given  in  Figure  4,  and  the  four  spatial  locations  for  evaluation  of  the 
proposed  motion  index  are  highlighted. 

The  two  sequences  arc  corrupted  with  various  levels  of  noise,  and  the  motion  is  estimated  at  each  of 
the  four  spatial  locations  selected  with  both  the  proposed  motion  index  and  that  of  [3],  The  results  of 
the  motion  index  used  in  [3]  is  given  in  Figure  5.  As  shown  in  Figure  5,  the  motion  index  of  [3]  is  not 
robust  to  noise  corruption.  That  is,  the  motion  calculation  from  the  same  spatial  location  increases  with 
an  increase  in  noise.  Also,  the  motion  index  shows  the  FOOTBALL  image  sequence  (x  =  300,  y  =  220) 
as  having  a  higher  motion  index  than  the  CLAIRE  image  sequence  (x  =  40,  y  =  200)  with  zero  noise 
corruption.  However,  the  motion  index  shows  the  opposite  results  with  higher  levels  of  noise.  Thus,  the 
motion  index  gives  conflicting  results  with  the  introduction  of  noise. 

The  results  of  the  proposed  SATSD  motion  index  arc  given  in  Figure  6.  As  shown  in  Figure  6,  the 
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Fig.  4.  Spatial  positions  of  motion  estimation  test  points.  Left:  FOOTBALL  image  sequence,  frame  #96.  Right:  CLAIRE 
image  sequence,  frame  #167 


Local  Motion  Estimate  of  [3]  for  Varying  Noise  Levels 


Fig.  5.  Motion  estimate  given  in  [3]  of  image  sequences,  CLAIRE  and  FOOTBALL. 

proposed  motion  index  is  much  more  robust  to  varying  noise  levels,  and  the  order  of  locations  from 
highest  to  lowest  motion  is  what  one  would  believe  is  correct.  The  location  with  the  lowest  motion  index 
is  in  the  CLAIRE  image  sequence  where  there  is  no  camera  motion,  and  there  are  no  moving  objects 
in  that  spatial  location.  The  next  lowest  motion  location  is  in  the  FOOTBALL  image  sequence  in  the 
spatial  location  where  there  are  no  moving  objects.  However,  there  is  some  slight  camera  motion  in  the 
sequence,  so  the  motion  index  is  slightly  higher  than  in  the  CLAIRE  image  sequence.  The  location  with 
the  next  highest  motion  index  is  the  center  of  the  CLAIRE  image  sequence,  where  there  is  some  motion 
due  to  movement  of  the  head,  and  the  location  with  the  highest  motion  index  is  the  FOOTBALL  image 
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Proposed  Local  Motion  Estimate  for  Varying  Noise  Levels 


Claire  image  sequence  (frames  104-167),  pos:  x=40,  y=200 
Claire  image  sequence  (frames  104-167),  pos:  x=180,  y=144 

-  Football  image  sequence  (frames  33-96),  pos:  x=300,  y=220 

-  Football  image  sequence  (frames  33-96),  pos:  x=160,  y=120 


20 

Noise  Std.  Dev. 


Fig.  6.  Proposed  motion  estimate  of  image  sequences,  CLAIRE  and  FOOTBALL, 
sequence  in  the  spatial  location  where  many  objects  cross. 

V.  Temporal  Domain  Parameter  Selection 

The  amount  of  temporal  denoising  which  is  beneficial  to  an  image  sequence  is  dependent  upon  the 
amount  of  noise  corruption  as  well  as  the  amount  of  motion.  Thus,  the  threshold  tz[1]  is  given  by 

Tz[l\  =  adn  + (3Mi  (23) 

where  Mi  is  the  motion  index  of  spatial  position  l,  and  a„  is  the  estimated  noise  standard  deviation  of  the 
image  sequence.  The  two  parameters  a  and  (3  are  determined  experimentally  using  test  image  sequences. 

In  the  proposed  coefficient  selection  method,  we  use  a  training  sample  approach.  The  approach  starts 
with  a  series  of  test  image  sequences  serving  as  training  samples  to  derive  the  functions  which  determine 
the  optimal  set  of  the  values  for  a  and  (3.  Theoretically,  we  may  represent  each  training  sample  as  a  vector 
V{,  i  =  l,n.  Those  training  samples  should  span  a  space  which  covers  more  corrupted  image  sequences 
than  the  training  samples: 

S  =  Span{Tj;?'  =  l,..n}.  (24) 

The  original  data  and  the  statistical  distribution  of  the  noise  are  given  for  each  of  the  training  samples 
which  arc  corrupted.  The  optimal  set  of  parameters  can  then  be  determined  for  the  training  samples  using 
the  approach  described  earlier.  Ideally,  the  space  spanned  by  the  training  samples  contains  the  type  of  the 
corrupted  image  sequences  which  are  to  be  denoised.  As  a  result,  the  same  set  can  generate  an  optimal 
or  close  to  optimal  performance  for  the  corrupted  image  sequences  of  same  type.  It  is  clear  that  more 
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training  samples  will  generate  parameters  suitable  for  more  types  of  image  sequences,  while  a  space  of 
fewer  training  samples  is  suitable  for  fewer  types  of  image  sequences. 

In  order  to  obtain  an  estimate  of  the  noise  level,  a n,  an  average  is  taken  from  the  noise  estimates  of 
each  frame  in  the  image  sequence,  given  by  Equation  7.  It  is  reasonable  to  assume  an  IID  (Independent, 
Identically  Distributed)  model  for  the  level  of  noise  for  each  pixel  position  since  noise  in  each  pixel  posi¬ 
tion  is  generated  by  individual  sensing  units  of  the  image  sensor  such  as  CCD  [8]  which  arc  independent. 
As  a  result,  the  estimate  of  the  standard  deviation  of  the  noise  ( an )  in  each  image  also  represents  the 
standard  deviation  of  the  noise  in  the  temporal  domain.  Therefore,  we  can  use  the  estimate  of  the  noise 
in  the  spatial  domain  to  estimate  that  in  the  temporal  domain. 

It  should  be  pointed  out  that  after  the  denoising  has  occurred  in  the  spatial  domain  using  the  SFT 
method,  the  standard  deviation  of  the  noise  is  significantly  reduced.  That  reduction  is  statistically  equal 
to  each  frame.  As  a  result,  the  estimated  noise  in  the  spatial  domain  can  still  be  nominally  used  for  noise 
reduction  in  the  temporal  domain  as  the  reduction  of  an  can  be  automatically  absorbed  by  a. 

The  sequences  CLAIRE,  FOOTBALL,  and  TREVOR  are  used  for  a  and  (3  selection.  Each  of  the 
image  sequences  arc  corrupted  with  differing  levels  of  noise  corruption  ( an  =  10,  20)  and  denoised  with 
the  SFT  denoising  method  where  Equation  23  is  used  as  the  temporal  domain  threshold.  Values  of  a 
and  f3  arc  used  ranging  from  a  =  0  to  3.0  and  j3  =  —0.3  to  0.3.  The  results  of  this  testing  is  given  in 
Figure  7.  As  shown  in  Figure  7  the  maximum  average  PSNR  is  achieved  when  a  =  0.9  and  /3  =  —0.11. 

Average  PSNR  for  image  sequences  used  in  test  varying  a  and  p 


Fig.  7.  a  and  (3  parameter  testing  for  temporal  domain  denoising. 

The  result  is  reasonable,  of  course,  because  as  the  motion  increases  in  an  image  sequence  the  redundancy 
between  frames  decreases,  and  the  benefits  of  temporal  domain  processing  decrease.  Thus,  as  the  testing 
has  shown,  the  temporal  domain  threshold  decreases  as  the  motion  increases. 

March  22,  2004 


51 


16 


VI.  Experimental  Results 

The  proposed  video  denoising  algorithm  first  is  applied  to  each  of  the  video  frames  individually  and 
independently.  The  method  of  [1]  was  developed  earlier  by  our  previous  research  to  denoise  images,  and 
is  used  as  the  spatial  denoising  portion  of  the  wavelet-based  video  denoising  algorithms. 

The  video  signal  is  then  denoised  in  the  temporal  domain  by  the  method  developed  in  Sections  III  and 
V.  The  temporal  denoising  algorithm  is  a  selective  shrinkage  algorithm  which  uses  a  proposed  motion 
estimation  index  to  determine  the  temporal  threshold,  tz[-\.  The  temporal  threshold  is  modified  by  the 
motion  index  to  effectively  eliminate  temporal  domain  noise  while  preserving  important  motion  informa¬ 
tion. 

Three  image  sequences  arc  used  to  determine  the  effectiveness  of  the  proposed  video  denoising  method. 
They  arc  the  SALESMAN  image  sequence,  the  TENNIS  image  sequence,  and  the  FLOWER  image  se¬ 
quence.  These  three  sequences  arc  all  corrupted  with  various  levels  of  noise  and  denoised  with  the 
methods  of  [1,  3,21,  30]  as  well  as  the  proposed  method.  Please  note  that  only  the  temporal  domain 
denoising  algorithm  of  [21]  is  being  tested.  The  spatial  domain  denoising  methods  of  [1]  is  used  for  all 
the  wavelet-based  video  denoising  methods.  The  results  are  given  in  Figures  8  through  13.  As  shown 


SALESMAN  image  sequence,  std.  =  1 0 


Fig.  8.  Denoising  methods  applied  to  the  SALESMAN  image  sequence,  std.  =  10 

in  Figures  8  through  13,  the  proposed  method  consistently  outperforms  the  other  methods  presented.  In 
all  cases,  the  proposed  denoising  method  has  a  higher  average  PSNR  then  all  other  denoising  methods 
tested.  Also,  note  that  in  the  method  of  [21],  the  threshold  T  changes  due  to  video  content  and  noise  level 
to  obtain  the  highest  average  PSNR  using  that  particular  method.  In  the  proposed  method,  the  temporal 
domain  threshold  is  automatically  calculated  due  to  estimates  of  the  noise  level  and  motion. 

Figures  14  through  19  give  an  example  of  the  effectiveness  of  each  of  the  denoising  methods.  Figure  14 
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SALESMAN  image  sequence,  std.  =  20 


Fig.  9.  Denoising  methods  applied  to  the  SALESMAN  image  sequence,  std.  =  20 


TENNIS  image  sequence,  std.  =  10 


Fig.  10.  Denoising  methods  applied  to  the  TENNIS  image  sequence,  std.  =  10 

gives  the  original  frame  #7  of  the  SALESMAN  image  sequence,  and  Figure  15  gives  frame  #7  corrupted 
with  noise.  Frames  16  through  19  give  frame  #7  denoised  by  each  of  the  methods  mentioned  in  the 
section. 


VII.  Conclusions 

In  this  paper,  a  new  combined  spatial  and  temporal  domain  wavelet  shrinkage  method  is  developed 
for  the  removal  of  noise  in  video  signals.  The  proposed  method  uses  a  geometrical  approach  to  spatial 
domain  denoising  to  preserve  edge  information,  and  a  newly  developed  motion  estimation  index  for 
selective  wavelet  shrinkage  in  the  temporal  domain. 
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TENNIS  image  sequence,  std.  =  20 


Fig.  11.  Denoising  methods  applied  to  the  TENNIS  image  sequence,  std.  =  20 


FLOWER  image  sequence,  std.  =  1 0 


Fig.  12.  Denoising  methods  applied  to  the  FLOWER  image  sequence,  std.  =  10 

The  spatial  denoising  technique  is  a  selective  wavelet  shrinkage  algorithm  developed  in  [1]  and  is 
shown  to  outperform  other  wavelet  shrinkage  denoising  algorithms  given  in  the  literature  both  in  denoised 
image  quality  and  computation  time. 

The  temporal  denoising  algorithm  is  also  a  selective  wavelet  shrinkage  algorithm  which  uses  a  motion 
estimation  index  to  determine  the  level  of  thresholding  in  the  temporal  domain. 

The  proposed  motion  index  is  experimentally  determined  to  be  more  robust  to  noise  corruption  than 
other  methods,  and  is  able  to  help  determine  the  threshold  value  for  selective  wavelet  shrinkage  in  the 
temporal  domain.  With  the  motion  index  and  temporal  domain  wavelet  shrinkage,  the  proposed  video 
denoising  method  is  experimentally  proven  to  outperform  other  methods  given  in  the  literature  for  various 
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FLOWER  image  sequence,  std.  =  20 


Fig.  13.  Denoising  methods  applied  to  the  FLOWER  image  sequence,  std.  =  20 


Fig.  14.  Original  frame  #7  of  the  SALESMAN  image  sequence 


levels  of  noise  corruption  applied  to  video  signals  with  varying  amounts  of  motion. 


March  22,  2004 


55 


20 


Fig.  15.  SALESMAN  image  sequence  corrupted,  std.  =  20,  PSNR  =  22.10 


Fig.  16.  Results  of  the  3D  K-nearest  neighbors  filter,  [30],  PSNR  =  28.42 
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Abstract —  This  paper  presents  a  novel  feature  selection  method 
which  is  named  Filtered  and  Supported  Sequential  Forward 
Search  (FS_SFS)  in  the  context  of  Support  Vector  Machines 
(SVM).  In  comparison  with  conventional  wrapper  methods  em¬ 
ploying  the  sequential  forward  search  (SFS)  strategy,  it  has  two 
important  features  that  reduce  the  computation  time  of  SVM 
training  during  the  feature  selection  process.  First,  in  stead  of 
utilizing  all  the  training  samples  to  obtain  the  classifier,  FS  SFS, 
by  taking  advantage  of  the  existence  of  support  vectors  in  SVM, 
dynamically  maintains  an  active  data  set  for  each  SVM  to  be 
trained  on.  In  this  way  the  computational  demand  of  a  single 
SVM  training  decreases.  Secondly,  a  new  criterion,  in  which 
discriminant  ability  of  individual  features  and  the  correlation 
between  them  are  both  taken  into  consideration,  is  proposed 
to  effectively  filter  out  non-essential  features  before  every  SFS 
iteration  begins.  As  a  result,  the  total  number  of  training  is 
significantly  reduced.  The  proposed  approach  is  tested  on  both 
synthetic  and  real  data  to  demonstrate  its  effectiveness  and 
efficiency. 

Index  Terms —  feature  selection,  sequential  forward  search  (SFS), 
support  vector  machines  (SVM),  FS_SFS. 

I.  Introduction 

Feature  dimensionality  reduction  is  of  considerable  im¬ 
portance  for  two  primary  reasons:  reduce  the  computational 
complexity  and  improve  the  classifier’s  generalization  ability. 
Feature  selection  addresses  the  dimensionality  reduction  prob¬ 
lem  by  determining  which  subset  of  those  features  are  most 
essential  for  classification.  Based  on  the  criterion  for  subset 
evaluation,  feature  selection  approaches  can  be  grouped  into 
two  categories:  filter  methods  and  wrapper  methods  [1].  Ac¬ 
quiring  no  feed  back  from  classifiers,  filter  methods  estimate 
the  classification  performance  by  some  indirect  assessment 
such  as  distance  measures.  Wrapper  methods,  on  the  contrary, 
are  classifier-dependent.  They  evaluate  the  “goodness”  of  the 
selected  feature  subset  directly  based  on  the  classification 
accuracy,  which  would  intuitively  yield  better  performance.  As 
a  matter  of  fact,  experimental  results  are  in  general  reported 
in  favor  of  the  wrapper  methods  [1]  [2]  even  though  more 
computational  cost  is  needed. 

As  a  state-of-art  classifier.  Support  Vector  Machines  (SVM) 
has  been  successfully  applied  in  a  variety  of  areas  [3]— [5] . 
However,  given  the  fact  that  training  just  a  single  SVM  would 
impose  a  lot  of  computation  when  the  number  of  training 
samples  is  large,  the  integration  of  SVM  and  wrapper  methods, 
which  calls  for  multiple  times  of  SVM  training  process,  might 


be  computationally  infeasible.  In  this  paper  we  present  a 
expedited  wrapper  method  for  SVM  which  is  named  Filtered 
and  Supported  Sequential  Forward  Search  (FS  SFS).  As  its 
name  suggests,  this  new  wrapper  feature  selection  method 
employs  sequential  forward  search  strategy  (SFS),  but  it  has 
the  following  advantages  over  the  conventional  wrapper/SFS 
method: 

1)  FS  SFS  combines  the  advantages  of  filter  and  wrapper 
methods  by  introducing  a  filtering  process  for  each  SFS 
iteration; 

2)  FS_SFS  introduces  a  new  criterion  that  is  computation¬ 
ally  simple  and  considers  both  discriminant  ability  of 
individual  features  and  the  correlation  between  them; 

3)  FS  SFS  improves  the  efficiency  of  obtaining  a  single 
SVM  classifier  by  dynamically  maintaining  an  small 
active  training  set. 

The  rest  of  the  paper  is  organized  as  follows.  Section  II 
gives  a  brief  introduction  of  SVM  and  Section  III  explains 
FS  SFS  in  detail.  Experimental  results  are  given  in  section  IV 
followed  by  conclusions  and  discussions  in  section  V. 

II.  Support  Vector  Machines 

SVM  is  a  state-of-art  learning  machine  based  on  the  struc¬ 
tural  risk  minimization  induction  principle.  Here  we  only  give 
a  very  brief  review  while  the  detailed  description  can  be  found 
in  [6].  Consider  N  training  sample  pairs 

{X(l),  Y(l)}, {X(2),Y(2)}, . .  .,{X(N),Y(N)}, 

where  X  (i)  is  a  k-dimensional  feature  vector  representing  the 
ith  training  sample,  and  Y(i)  £  {—1,1}  is  the  class  label  of 
X{i). 

A  hyperplane  in  the  feature  space  can  be  described  as 
the  equation  W  ■  X  +  b  =  0,  where  W  £  Rk  and  b  is 
a  scalar.  When  the  training  samples  are  linearly  separable, 
SVM  yields  the  optimal  hyperplane  that  separates  two  classes 
with  no  training  error  and  maximizes  the  minimum  distance 
from  a  point  X(i )  to  the  hyperplane  by  solving  the  following 
optimization  problem: 

Minimize  :  /(IF)  =  \  ||IF||2 

Subject  to:  Y(i)  (  W  ■  X(i)  +  b)  >  1,  i  =  1, ...,  N.  (1) 

For  linearly  nonseparable  cases,  there  is  no  such  a  hyperplane 
that  is  able  to  classify  every  training  point  correctly.  However 
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wrapper  part 


(a) 

Fig.  1.  The  outline  of  the  proposed  method  for  feature  selection  for  SVM. 


the  previous  idea  can  be  generalized  by  introducing  the 
concept  of  soft  margin.  Thus  the  new  optimization  problem 
becomes 

Minimize:  f(W,  £)  =  \  \\W\\2  +  C'E^=i£(*) 

Subject  to  :  Y(i)(W  ■  X(i)  +  b)  >  1  -  =  1, . . .  N(2) 

where  £(z),  is  called  a  slack  variable  and  related  to  the  soft 
margin.  Both  optimization  problems  (1)  and  (2)  can  be  solved 
by  introducing  the  Lagrange  multipliers  a(i)  that  reduces  them 
to  quadratic  programming  problems. 

In  the  classification  phase,  a  point  X  in  the  feature  space 
is  assigned  a  label  Y  according  to  the  following  equation: 

Y  =  sgn[W-X  +  b] 

=  sgn[E!Ii«(*)y(*)(^(*)  -X)  +b}.  (3) 

III.  FS_SFS:  Filtered  and  Supported 
Sequential  Forward  Search 

A.  Algorithm  Review  of  FSJSFS 

The  outline  of  the  proposed  method  is  shown  in  Fig.  1.  The 
filtering  part  in  our  approach,  acting  in  the  generic  way  similar 
to  a  filter  method,  ranks  features  without  involving  the  clas¬ 
sifier.  The  features  with  relatively  high  ranks  are  considered 
as  “informative”  feature  candidates  and  then  re-studied  by  the 
wrapper  part  that  further  investigates  their  contributions  to  a 
specific  classifier.  This  combinational  framework  delivers  as 
good  performance  as  the  conventional  wrapper  method  but  is 
computationally  simpler. 

Now  with  the  framework  determined,  the  feature  selection 
problem  is  reduced  to  a  search  problem  to  find  the  optimal  sub¬ 
set  [7],  Many  search  strategies  have  been  proposed  [8]— [10], 
and  we  adopt  a  suboptimal  search  method  called  sequential 
forward  search  (SFS)  [10]  algorithm  for  its  simplicity  and 
effectiveness  proven  in  many  applications.  In  the  following 
three  subsections,  we  will  explain  how  FS  SFS  works  in  detail. 

B.  F  SFS:  Filtered  SFS  Using  a  New  Criterion 

Evidently  an  effective  filtering  criterion  is  needed  since  it  is 

undesirable  if  many  informative  features  are  discarded  by  the 
filtering  process.  Also  the  criterion  should  be  simple  to  avoid 
excessive  computational  cost.  To  address  this  problem,  we 
propose  the  following  new  filtering  criterion,  which  considers 
both  the  discriminant  ability  of  individual  features  as  well  as 
the  correlation  between  them.  Also  it  is  simple  to  calculate. 


Suppose  we  have  obtained  a  feature  combination  Fs  = 
{/„!,  fn2, . . . ,  fSd}  and  need  to  add  one  more  feature.  Then 
the  score  for  a  feature  f,  is  computed  as  follows. 


1)  discriminant  ability  of  individual  features 

The  discriminant  ability  of  feature  f  is  described  by 


|  mn\  —  ml2 1 
std\  +  stdl2  ’ 


(4) 


where  m\  and  std\  (m\  and  .std\)  are  the  mean  and 
standard  deviation  of  the  samples  belonging  to  class  1 
(-1)  when  only  feature  /,  is  considered. 

2)  correlation  between  features 

First  we  define  the  correlation  coefficient  ptj  between 
two  features,  say  ,/j  and  fj. 


C— 1 


C=1 


:(^Sc(fi)^J  •  var^5c(/,)^ 


(5) 


where  Sc(fi )  =  {&/<  (01^(0  =  c}  is  the  vectors  that 
represented  by  feature  /,  and  labeled  as  class  c. 

Then  based  on  p,r  we  define  the  correlation  coefficient 
between  ft  and  Fs  as 

Pi,Fd  =  max  pij.  (6) 


It  is  desirable  to  select  the  features  that  can  individually 
separate  the  classes  well  but  has  small  correlation  with  the 
feature  set  that  has  been  obtained.  Thus  the  final  score  assigned 
to  fi  is  defined  as: 

Ri,Fs  =  I  Pi,Fs\ - 77TT’  (7) 

max  {D,} 


where  D *  is  normalized  such  that  it  is  in  the  same  range  as 
I  Pi,Fs\- 


C.  S  SFS:  Supported  SFS  in  the  Context  of  SVM 

In  SVM  there  is  an  special  group  of  training  samples  named 
“support  vectors”,  whose  corresponding  coefficients  a(i)  in 
Eq.  (3)  are  non-zeros.  In  other  words,  samples  other  than 
support  vectors  have  no  contribution  to  determining  the  deci¬ 
sion  boundary.  Since  usually  the  number  of  support  vectors  is 
relatively  small,  we  could  train  SVM  just  by  using  the  support 
vectors.  Following  this  idea,  we  propose  the  supported  SFS 
algorithm,  which  dynamically  maintains  an  active  training 
set  as  estimated  candidates  of  the  support  vectors,  and  trains 
SVM  using  this  reduced  subset  rather  than  the  whole  original 
training  set.  In  this  way,  we  are  able  to  find  the  boundary  with 
less  computational  cost. 

The  procedure  of  S_SFS  is  described  as  follows.  The  first 
step  is  to  select  the  best  single  feature.  To  do  so,  we  train  SVM 
k  times,  each  of  which  uses  all  the  training  pairs  available  but 
only  considers  the  individual  feature  /,.  Mathematically  the 
initial  feature  combination  set  is  F{  =  fi,fi  £  F,  and  the 
active  training  set  is  Vj*  =  {1,2,..., TV}. 

Although  in  this  step  every  training  pair  in  S  is  evolved  in 
this  initial  training  task,  the  computational  complexity  is  not 
high  because  the  input  vector  is  just  one-dimensional.  After 
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the  training,  each  single -feature  combination  F[  is  associated 
with  a  margin  value  M\  and  a  group  of  support  vectors  V{. 
The  feature  that  yields  the  smallest  margin 


j  =  arg  min  M\  (8) 

is  then  chosen  as  the  best  single  feature.  Thus  we  obtain  the 
initial  feature  combination  F\  =  {fj}  and  its  active  training 
set  V\  —  {fj}  for  the  next  step. 

At  step  n,  we  have  already  obtained  the  feature  combination 
Fn  that  contains  n  features,  and  the  active  training  set  Vn. 
To  choose  one  more  feature  into  the  feature  combination  set, 
we  add  each  remaining  feature  f,  one  by  one  and  construct 
the  corresponding  active  training  set  for  every  new  feature 
combination  as  follows: 

f  F‘+1  =  Fn  U  {/<},  for  n  G  F*\ 

1  4  =  y-uW. 

where  F“v  =  {fr  \  fr  G  F  and  fr  qL  Fn}  is  the  collection  of 
the  available  features  to  be  selected  from. 

For  each  T‘+1  we  train  SVM  using  the  samples  in  Vr'l+l. 
The  resulting  margin  and  the  collection  of  the  support  vectors 
are  denoted  as  M^+1  and  SV*+1,  respectively.  Then  the 
feature  fj  that  yields  the  combination  with  the  least  margin  as 

j  =  arg  min  M*+1  (10) 

h  eF“» 


is  selected,  and  accordingly  the  new  feature  combination  Fn+ 1 
and  new  active  training  set  Vn+ 1  are  obtained  as  follows: 


f  Fn- i-i  —  Fl+1, 
1  Vn+i  =  SV^+j 


(11) 


The  SFS  process  continues  until  no  significant  margin  reduc¬ 
tion  is  found  or  the  desired  number  of  features  is  obtained. 


D.  FSXFS:  the  Integration  of  F_SFS  and  SSFS 

The  integration  of  F_SFS  and  S_SFS  is  quite  straightforward 
for  which  the  basic  idea  is  discarding  the  features  with  high 
scores  that  is  computed  according  to  Eq.  (7)  to  reduce  the 
number  of  features  S_SFS  has  to  evaluate.  Again  suppose  we 
are  at  step  n  of  SFS  with  F„  and  Vn  available,  and  FS_SFS 
works  as  follows: 

1)  calculate  the  score  RtiFr,  for  each  remaining  feature  _/); 

2)  select  Kn  lowest  scored  features  to  construct  Ffv ; 

3)  determine  the  next  feature  to  be  added  using  Eq.  (9)  and 
Eq.  (10); 

4)  update  the  active  training  set  using  Eq.  (11). 

I\n  here  is  the  tuning  parameter  to  balance  between  the 
performance  and  the  algorithm  complexity.  In  our  experiments, 
Kn  is  set  to  L^J  such  that  half  of  the  available  features  are 
discarded  at  every  SFS  iteration  step. 

IV.  Experimental  Results 

In  the  experiments,  the  proposed  feature  selection  method 
is  applied  to  both  synthetic  and  real-world  data  sets.  For  all 
the  experiments,  the  SVM  optimization  is  achieved  by  using 
SVMTorch  [11], 


A.  Results  on  Synthetic  Data 

Three  series  of  experiments  are  carried  out  on  the  synthetic 
data  sets,  and  for  each  experiment  we  sample  N  vectors  X  = 
(xi,X2,  •  •  • ,  Xk )  from  two  classes  (class  1  or  class  -1)  in  a  k- 
dimensional  data  space.  The  components  x.(  are  independent 
Gaussian  variables  whose  distributions  are  designed  as: 

F-  explf^F),  if  X  belongs  to  class  1; 

y  ZTltTi  v  i  7  /i  y 

exp(^f^),  if  X  belongs  to  class  -1, 

where  Oi  =  0.5  •  26_1)  and  i  =  1,2, . . . ,  k. 

The  first  experiment  is  a  2-D  case  where  k  =2  and  N  = 
100.  Fig.  2  shows  how  the  active  training  set  changes  when 
features  are  added  one  by  one  into  the  candidate  feature  set 
F.  FS_SFS  is  also  tested  in  a  3-D  case  where  k  =  3  and 
N  =  100.  In  both  2-D  and  3-D  scenarios,  we  observe  that 
with  our  experiment  setting  FS  SFS  and  the  conventional  SVS 
methods  generate  exactly  the  same  support  vectors  . 

In  the  third  experiment,  we  test  FS_SFS  in  a  10-dimensional 
case  where  A:  =  10  and  N  =  250.  According  to  Eq.  (12),  if 
i  <  j  the  variance  of  feature  Xi  is  larger  than  that  of  x3 ,  and 
therefore  x,  has  more  discriminant  ability.  For  that  reason,  we 
expect  Xi  to  be  selected  before  x3 .  For  display  purpose,  we 
assign  a  feature  x,;  a  score  as  11  —  pos(xj),  where  pos(xj)  is 
the  order  of  x,  selected.  For  example,  if  xt  is  the  number  one 
selected  feature  component,  its  score  would  be  10.  Fig.  3(a) 
gives  the  ideal  score  of  x^.  Fig.  3(b)  and  Fig.  3(c)  show  the 
scores  of  features,  which  are  averaged  over  100  trials,  when 
SFS  and  FS  SFS  are  applied,  respectively.  As  one  can  see, 
FS_SFS  is  able  to  achieve  similar  results  of  SFS  with  lower 
computational  cost. 

B.  Results  on  Real-World  Data 

The  proposed  algorithm  is  applied  to  four  real-world  data 
sets  obtained  from  the  widely-used  UCI  (University  of  Cali¬ 
fornia,  Irvine)  repository  of  machine  learning  [12],  These  data 
sets  are: 

1)  the  BUPA  Liver  Disorders  data  set  (BUPA  Liver)  which 
contains  354  instances  with  6  features; 

2)  the  Wisconsin  Breast  Cancer  data  set  (BCW)  which 
contains  683  instances  with  9  feature; 

3)  the  data  of  letter  ’A’  and  ’B’  from  Letter  Image  Recogni¬ 
tion  data  set  (A-B-letter)  which  contains  1555  instances 
with  16  feature; 

4)  the  Johns  Hopkins  University  Ionosphere  data  set  (Iono¬ 
sphere)  which  contains  351  instances  with  34  feature. 

For  each  data  set  we  randomly  set  aside  20%  instances  as 
the  testing  samples,  and  the  rest  as  the  training  samples.  The 
results  are  listed  in  Table  I.  As  one  can  see,  FS  SFS  improves 
the  efficiency  of  SFS  without  sacrificing  the  accuracy  of  either 
the  selection  or  the  classification. 

V.  Conclusions 

In  this  paper,  we  present  a  novel  feature  selection  method 
for  SVM.  By  introducing  a  feature  pruning  process,  we  filter 
out  “uninformative”  features  to  reduce  the  required  number  of 
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(a)  (b)  (c) 

Fig.  2.  The  active  training  set  (circled)  maintained  by  S_SFS  of  a  2-D  case,  (a)  v\,  which  is  the  support  vectors  obtained  by  considering  only  feature  x\. 
(b)  V2,  which  is  the  support  vectors  obtained  by  considering  only  feature  x 2.  (c)  The  support  vectors  obtained  by  training  SVM  on  V  =  v\  U'  V2- 


(a)  (b)  (c) 

Fig.  3.  The  score  of  feature  components,  (a)  The  ideal  scores,  (b)  The  scores  obtained  by  using  SFS.  (b)  The  scores  obtained  by  using  FS_SFS. 

TABLE  I 

Comparison  of  Classification  Accuracy  and  Run  Time  between  FS.SFS  and  SFS  over  10  trials. 


number  of 

features 

classification  accuracy  (%) 

Run 

Time  (seconds) 

available 

selected 

training 

testing 

FS-SFS 

SFS 

FS-SFS/ SFS 

FS-SFS 

SFS 

FS-SFS 

SFS 

BUPA  Liver 

6 

4.6 

78.7% 

78.5% 

70.2% 

70.7% 

4.31 

6.08 

71% 

BCW 

9 

5.5 

97.4% 

97 . 4% 

96.3% 

95.4% 

10.61 

13.31 

79.7% 

A-B  Letter 

16 

6.2 

99.95% 

100% 

99.7% 

99.8% 

48.8 

65.0 

72% 

Ionosphere 

34 

10.0 

98.9% 

99.3% 

92.0% 

90.6% 

81.5 

118.9 

68.5% 

training.  We  also  develop  a  new  feature  ranking  criterion,  in 
which  not  only  the  class  separability  of  individual  features  but 
also  the  correlation  between  features  are  taken  into  account,  to 
make  the  pruning  process  more  effective.  Furthermore,  during 
the  SFS  searching  process,  an  active  training  set  is  maintained 
as  the  estimated  candidates  of  the  support  vectors.  Whenever 
SVM  has  to  be  trained,  it  is  done  over  the  reduced  training  set. 
In  this  way,  the  number  of  samples  participating  in  a  single 
optimization  procedure  decreases  and  therefore  the  training 
process  is  expedited.  We  test  the  proposed  method  on  both 
artificial  and  real-world  data  sets,  and  the  experimental  results 
demonstrate  its  effectiveness  and  efficiency. 
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